LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc: Fix build errors STRICT_MM_TYPECHECKS
From: Aneesh Kumar K.V @ 2013-05-06 20:51 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pte-hash64-64k.h | 2 +-
 arch/powerpc/kernel/pci-common.c          | 5 ++---
 arch/powerpc/mm/init_64.c                 | 3 ++-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h b/arch/powerpc/include/asm/pte-hash64-64k.h
index 7ad8ef1..98e27be 100644
--- a/arch/powerpc/include/asm/pte-hash64-64k.h
+++ b/arch/powerpc/include/asm/pte-hash64-64k.h
@@ -58,7 +58,7 @@
  * generic accessors and iterators here
  */
 #define __real_pte(e,p) 	((real_pte_t) { \
-			(e), ((e) & _PAGE_COMBO) ? \
+			(e), (pte_val(e) & _PAGE_COMBO) ? \
 				(pte_val(*((p) + PTRS_PER_PTE))) : 0 })
 #define __rpte_to_hidx(r,index)	((pte_val((r).pte) & _PAGE_COMBO) ? \
         (((r).hidx >> ((index)<<2)) & 0xf) : ((pte_val((r).pte) >> 12) & 0xf))
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f325dc9..fd4b917 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -359,7 +359,6 @@ static pgprot_t __pci_mmap_set_pgprot(struct pci_dev *dev, struct resource *rp,
 				      enum pci_mmap_state mmap_state,
 				      int write_combine)
 {
-	unsigned long prot = pgprot_val(protection);
 
 	/* Write combine is always 0 on non-memory space mappings. On
 	 * memory space, if the user didn't pass 1, we check for a
@@ -376,9 +375,9 @@ static pgprot_t __pci_mmap_set_pgprot(struct pci_dev *dev, struct resource *rp,
 
 	/* XXX would be nice to have a way to ask for write-through */
 	if (write_combine)
-		return pgprot_noncached_wc(prot);
+		return pgprot_noncached_wc(protection);
 	else
-		return pgprot_noncached(prot);
+		return pgprot_noncached(protection);
 }
 
 /*
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index c9ec657..d0cd9e4 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -218,7 +218,8 @@ static void __meminit vmemmap_create_mapping(unsigned long start,
 					     unsigned long phys)
 {
 	int  mapped = htab_bolt_mapping(start, start + page_size, phys,
-					PAGE_KERNEL, mmu_vmemmap_psize,
+					pgprot_val(PAGE_KERNEL),
+					mmu_vmemmap_psize,
 					mmu_kernel_ssize);
 	BUG_ON(mapped < 0);
 }
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 2/5] KVM: PPC: iommu: Add missing kvm_iommu_map_pages/kvm_iommu_unmap_pages
From: Alex Williamson @ 2013-05-06 21:07 UTC (permalink / raw)
  To: aik
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, Paul Mackerras,
	linux-pci, linuxppc-dev, David Gibson
In-Reply-To: <51875a36.25ac440a.50d9.ffffe90d@mx.google.com>

On Mon, 2013-05-06 at 17:21 +1000, aik@ozlabs.ru wrote:
> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> The IOMMU API implements groups creating/deletion, device binding
> and IOMMU map/unmap operations.
> 
> The PowerPC implementation uses most of the API except map/unmap
> operations, which are implemented on POWER using hypercalls.
> 
> However, in order to link a kernel with the CONFIG_IOMMU_API enabled,
> the empty kvm_iommu_map_pages/kvm_iommu_unmap_pages have to be
> defined, so this defines them.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_host.h |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index b6a047e..c025d91 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -603,4 +603,18 @@ struct kvm_vcpu_arch {
>  
>  #define __KVM_HAVE_ARCH_WQP
>  
> +#ifdef CONFIG_IOMMU_API
> +/* POWERPC does not use IOMMU API for mapping/unmapping */
> +static inline int kvm_iommu_map_pages(struct kvm *kvm,
> +		struct kvm_memory_slot *slot)
> +{
> +	return 0;
> +}
> +
> +static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
> +		struct kvm_memory_slot *slot)
> +{
> +}
> +#endif /* CONFIG_IOMMU_API */
> +
>  #endif /* __POWERPC_KVM_HOST_H__ */

This is no longer needed, Gleb applied my patch for 3.10 that make all
of KVM device assignment dependent on a build config option and the top
level kvm_host.h now includes this when that is not set.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH] powerpc, perf: Fix processing conditions for invalid BHRB entries
From: Michael Neuling @ 2013-05-06 21:22 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5187A144.9080307@linux.vnet.ibm.com>

Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:

> On 05/06/2013 04:41 PM, Michael Neuling wrote:
> > Anshuman Khandual <khandual@linux.vnet.ibm.com> wrote:
> > 
> >> Fixing some conditions during BHRB entry processing.
> > 
> > I think we can simplify this a lot more... something like the below.
> > 
> 
> I feel that the conditional handling of the invalid BHRB entries should be
> present which would help us during the debug and also could be used for more
> granular branch classification or error handling later on.

If we need that, then we can add it later on.  You'd still need to
modify the code to do any debug, branch classification or error handling
as the code is now.  It's just an unnecessary place holder.  

The code is unnecessarily verbose now.  This patch removes about 20
lines of code and does the same thing.

Mikey

> 
> > Also, this marks the "to" address as all 1s, which is better poison
> > value since it's possible to branch to/from 0x0.
> >
> 
> Agreed.
> 
> 
> > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> > index c627843..d410d65 100644
> > --- a/arch/powerpc/perf/core-book3s.c
> > +++ b/arch/powerpc/perf/core-book3s.c
> > @@ -1463,65 +1463,45 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
> >  {
> >  	u64 val;
> >  	u64 addr;
> > -	int r_index, u_index, target, pred;
> > +	int r_index, u_index, pred;
> > 
> >  	r_index = 0;
> >  	u_index = 0;
> >  	while (r_index < ppmu->bhrb_nr) {
> >  		/* Assembly read function */
> > -		val = read_bhrb(r_index);
> > +		val = read_bhrb(r_index++);
> > 
> >  		/* Terminal marker: End of valid BHRB entries */
> > -		if (val == 0) {
> > +		if (!val) {
> >  			break;
> >  		} else {
> >  			/* BHRB field break up */
> >  			addr = val & BHRB_EA;
> >  			pred = val & BHRB_PREDICTION;
> > -			target = val & BHRB_TARGET;
> > 
> > -			/* Probable Missed entry: Not applicable for POWER8 */
> > -			if ((addr == 0) && (target == 0) && (pred == 1)) {
> > -				r_index++;
> > +			if (!addr)
> > +				/* invalid entry */
> >  				continue;
> > -			}
> > -
> > -			/* Real Missed entry: Power8 based missed entry */
> > -			if ((addr == 0) && (target == 1) && (pred == 1)) {
> > -				r_index++;
> > -				continue;
> > -			}
> > 
> > -			/* Reserved condition: Not a valid entry  */
> > -			if ((addr == 0) && (target == 1) && (pred == 0)) {
> > -				r_index++;
> > -				continue;
> > -			}
> > -
> > -			/* Is a target address */
> >  			if (val & BHRB_TARGET) {
> >  				/* First address cannot be a target address */
> > -				if (r_index == 0) {
> > -					r_index++;
> > +				if (r_index == 0)
> >  					continue;
> > -				}
> > 
> >  				/* Update target address for the previous entry */
> >  				cpuhw->bhrb_entries[u_index - 1].to = addr;
> >  				cpuhw->bhrb_entries[u_index - 1].mispred = pred;
> >  				cpuhw->bhrb_entries[u_index - 1].predicted = ~pred;
> > -
> > -				/* Dont increment u_index */
> > -				r_index++;
> >  			} else {
> >  				/* Update address, flags for current entry */
> >  				cpuhw->bhrb_entries[u_index].from = addr;
> > +				cpuhw->bhrb_entries[u_index].to =
> > +					0xffffffffffffffff;
> >  				cpuhw->bhrb_entries[u_index].mispred = pred;
> >  				cpuhw->bhrb_entries[u_index].predicted = ~pred;
> > 
> >  				/* Successfully popullated one entry */
> >  				u_index++;
> > -				r_index++;
> >  			}
> >  		}
> >  	}
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> 

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/powernv: Disable IO space for PCI buses
From: Benjamin Herrenschmidt @ 2013-05-06 21:33 UTC (permalink / raw)
  To: Gavin Shan; +Cc: yinghai, linux-pci, bhelgaas, linuxppc-dev
In-Reply-To: <1367847858-6506-2-git-send-email-shangw@linux.vnet.ibm.com>

On Mon, 2013-05-06 at 21:44 +0800, Gavin Shan wrote:
> The patch intends to set the special flag (PCI_BUS_FLAGS_NO_IO) for
> root buses so PCI core will skip assignment for IO stuff. Besides,
> we also clear the IO resources on all PCI devices for PHB3.

Why the new hook ? Can't this be detected simply because there is
no aperture in the pci_host_bridge with IORESOURCE_IO set in the flags ?

Ben.

> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/machdep.h        |    3 ++
>  arch/powerpc/kernel/pci-common.c          |    7 +++++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   38 +++++++++++++++++++++++++++++
>  3 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index 3f3f691..ebc2ffd 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -220,6 +220,9 @@ struct machdep_calls {
>  	/* Called during PCI resource reassignment */
>  	resource_size_t (*pcibios_window_alignment)(struct pci_bus *, unsigned long type);
>  
> +	/* Called when adding PCI bus */
> +	void (*pcibios_add_bus)(struct pci_bus *);
> +
>  	/* Called to shutdown machine specific hardware not already controlled
>  	 * by other drivers.
>  	 */
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index f325dc9..7b8a6f1 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -120,6 +120,13 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>  	return 1;
>  }
>  
> +/* The function will be called while adding PCI bus to system */
> +void pcibios_add_bus(struct pci_bus *bus)
> +{
> +	if (ppc_md.pcibios_add_bus)
> +		ppc_md.pcibios_add_bus(bus);
> +}
> +
>  static resource_size_t pcibios_io_size(const struct pci_controller *hose)
>  {
>  #ifdef CONFIG_PPC64
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 8c6c9cf..0c3fa29 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1015,6 +1015,40 @@ static resource_size_t pnv_pci_window_alignment(struct pci_bus *bus,
>  	return phb->ioda.io_segsize;
>  }
>  
> +/*
> + * The function will be called while adding PCI bus to the
> + * system. In turn, we should set flag to indicate that the
> + * root bus doesn't have IO resources.
> + */
> +static void pnv_pci_add_bus(struct pci_bus *bus)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(bus);
> +	struct pnv_phb *phb = hose->private_data;
> +
> +	/*
> +	 * We only need set the flag for root bus since the
> +	 * bus flags are copied over from parent to children
> +	 */
> +	if (pci_is_root_bus(bus) &&
> +	    phb->model == PNV_PHB_MODEL_PHB3)
> +		bus->bus_flags |= PCI_BUS_FLAGS_NO_IO;
> +}
> +
> +static void pnv_pci_fixup_resources(struct pci_dev *dev)
> +{
> +	int i;
> +	struct resource *res;
> +
> +	if (!(dev->bus->bus_flags & PCI_BUS_FLAGS_NO_IO))
> +		return;
> +
> +	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
> +		res = dev->resource + i;
> +		if (res->flags & IORESOURCE_IO)
> +			res->flags = 0;
> +	}
> +}
> +
>  /* Prevent enabling devices for which we couldn't properly
>   * assign a PE
>   */
> @@ -1189,6 +1223,10 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
>  	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
>  	ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
>  	ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
> +	if (ioda_type == PNV_PHB_IODA2) {
> +		ppc_md.pcibios_add_bus = pnv_pci_add_bus;
> +		ppc_md.pcibios_fixup_resources = pnv_pci_fixup_resources;
> +	}
>  	pci_add_flags(PCI_REASSIGN_ALL_RSRC);
>  
>  	/* Reset IODA tables to a clean state */

^ permalink raw reply

* Re: [PATCH 3/3] powerpc/powernv: Don't configure IO window on PHB3
From: Benjamin Herrenschmidt @ 2013-05-06 21:34 UTC (permalink / raw)
  To: Gavin Shan; +Cc: yinghai, linux-pci, bhelgaas, linuxppc-dev
In-Reply-To: <1367847858-6506-3-git-send-email-shangw@linux.vnet.ibm.com>

On Mon, 2013-05-06 at 21:44 +0800, Gavin Shan wrote:
> We needn't configure IO windows for the corresponding PEs on PHB3
> since that doesn't support IO.

Here too, no need for such a flag, just check that
pci_controller->io_resource.flags is 0.

BTW. Please work on top of the patch I sent already that avoids adding
bogus resources to pci_host_bridge when their flags are 0. I'll send
it to Linus today.

Cheers,
Ben.

> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 0c3fa29..b4f3edb 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -894,7 +894,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
>  		    res->start > res->end)
>  			continue;
>  
> -		if (res->flags & IORESOURCE_IO) {
> +		/* We needn't setup IO windows for PHB3 */
> +		if (!(pe->pbus->bus_flags & PCI_BUS_FLAGS_NO_IO) &&
> +		    res->flags & IORESOURCE_IO) {
>  			region.start = res->start - phb->ioda.io_pci_base;
>  			region.end   = res->end - phb->ioda.io_pci_base;
>  			index = region.start / phb->ioda.io_segsize;

^ permalink raw reply

* Re: [PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in HWCAP2
From: Benjamin Herrenschmidt @ 2013-05-06 21:37 UTC (permalink / raw)
  To: Ryan Arnold
  Cc: linuxppc-dev, Nishanth Aravamudan, Steve Munroe, Peter Bergner,
	Michael Neuling, Michael R Meissner
In-Reply-To: <OFBCA2593E.F64367AB-ON86257B63.004ED7A8-86257B63.00506FC6@us.ibm.com>

On Mon, 2013-05-06 at 09:38 -0500, Ryan Arnold wrote:
> My understanding was that these bits being 'on' is an indication of
> what features the hardware supports (or what the kernel emulates) and
> a not an indication of whether that facility is currently enabled or
> not.  If the hardware supports a particular feature but it is not
> enabled I'd expect that user-space usage of that feature would cause
> the kernel to trap on a facility availability exception (which is how
> Altivec/VMX is implemented, being defaulted to turned off).

Right but the discussion is about whether we should expose the bits
when the kernel doesn't have the ability to handle the feature :-)

IE. We need to remove the HTM feature if the kernel is compiled without
transactional memory support.

Similarily, Nish, you may need to check that we remove those bits if
pHyp has the partition in a mode that doesn't support them (P7
compatibility for example) for migration purposes.

Cheers,
Ben.

> Otherwise there's no way I could know whether an ISA [optional]
> feature is actually available on a particular machine.
> 
> And yes, the bits can't change.  My usage of hwcap.h has to coincide
> with the kernel's asm/cputable.h

^ permalink raw reply

* Re: [PATCH v2 1/4] powerpc/cputable: reserve bits in HWCAP2 for new features
From: Benjamin Herrenschmidt @ 2013-05-06 21:41 UTC (permalink / raw)
  To: Ryan Arnold
  Cc: Michael Neuling, Nishanth Aravamudan, Steve Munroe, Peter Bergner,
	linuxppc-dev, Michael R Meissner
In-Reply-To: <OFA8304853.25190807-ON86257B63.0067EDA3-86257B63.00690C34@us.ibm.com>

On Mon, 2013-05-06 at 14:07 -0500, Ryan Arnold wrote:
> Notice that I changed DSCR to DSC.  The 'R' wasn't descriptive.

The "R" is the name of the register for which we are exposing the
availability to userspace... it's also the name of the sysfs entry so
I'd rather keep it for consistency.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Scott Wood @ 2013-05-06 23:50 UTC (permalink / raw)
  To: tiejun.chen; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <51871FCD.9070900@windriver.com>

On 05/05/2013 10:13:17 PM, tiejun.chen wrote:
> On 05/06/2013 11:10 AM, Tiejun Chen wrote:
>> For the external interrupt, the decrementer exception and the =20
>> doorbell
>> excpetion, we also need to soft-disable interrupts while doing as =20
>> host
>> interrupt handlers since the DO_KVM hook is always performed to skip
>> EXCEPTION_COMMON then miss this original chance with the 'ints' =20
>> (INTS_DISABLE).

http://patchwork.ozlabs.org/patch/241344/
http://patchwork.ozlabs.org/patch/241412/

:-)

>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> ---
>>   arch/powerpc/kvm/bookehv_interrupts.S |    9 +++++++++
>>   1 file changed, 9 insertions(+)
>>=20
>> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S =20
>> b/arch/powerpc/kvm/bookehv_interrupts.S
>> index e8ed7d6..2fd62bf 100644
>> --- a/arch/powerpc/kvm/bookehv_interrupts.S
>> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
>> @@ -33,6 +33,8 @@
>>=20
>>   #ifdef CONFIG_64BIT
>>   #include <asm/exception-64e.h>
>> +#include <asm/hw_irq.h>
>> +#include <asm/irqflags.h>
>>   #else
>>   #include "../kernel/head_booke.h" /* for THREAD_NORMSAVE() */
>>   #endif
>> @@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host)
>>   	PPC_LL	r3, HOST_RUN(r1)
>>   	mr	r5, r14 /* intno */
>>   	mr	r14, r4 /* Save vcpu pointer. */
>> +#ifdef CONFIG_64BIT
>> +	/* Should we soft-disable interrupts? */
>> +	andi.	r6, r5, BOOKE_INTERRUPT_EXTERNAL | =20
>> BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL
>> +	beq	skip_soft_dis
>> +	SOFT_DISABLE_INTS(r7,r8)
>> +skip_soft_dis:
>> +#endif

Why wouldn't we always disable them?  kvmppc_handle_exit() will enable =20
interrupts when it's ready.

-Scott=

^ permalink raw reply

* Re: [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Scott Wood @ 2013-05-06 23:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Alexander Graf, kvm-ppc, Mihai Caraman, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <1367787788.11982.58.camel@pasglop>

On 05/05/2013 04:03:08 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2013-05-03 at 18:45 -0500, Scott Wood wrote:
> > kvmppc_lazy_ee_enable() was causing interrupts to be soft-enabled
> > (albeit hard-disabled) in kvmppc_restart_interrupt().  This led to
> > warnings, and possibly breakage if the interrupt state was later =20
> saved
> > and then restored (leading to interrupts being hard-and-soft enabled
> > when they should be at least soft-disabled).
> >
> > Simply removing kvmppc_lazy_ee_enable() leaves interrupts only
> > soft-disabled when we enter the guest, but they will be =20
> hard-disabled
> > when we exit the guest -- without PACA_IRQ_HARD_DIS ever being set, =20
> so
> > the local_irq_enable() fails to hard-enable.
> >
> > While we could just set PACA_IRQ_HARD_DIS after an exit to =20
> compensate,
> > instead hard-disable interrupts before entering the guest.  This =20
> way,
> > we won't have to worry about interactions if we take an interrupt
> > during the guest entry code.  While I don't see any obvious
> > interactions, it could change in the future (e.g. it would be bad if
> > the non-hv code were used on 64-bit or if 32-bit guest lazy =20
> interrupt
> > disabling, since the non-hv code changes IVPR among other things).
>=20
> Shouldn't the interrupts be marked soft-enabled (even if hard =20
> disabled)
> when entering the guest ?
>=20
> Ie. The last stage of entry will hard enable, so they should be
> soft-enabled too... if not, latency trackers will consider the whole
> guest periods as "interrupt disabled"...

OK... I guess we already have that problem on 32-bit as well?

> Now, kvmppc_lazy_ee_enable() seems to be clearly bogus to me. It will
> unconditionally set soft_enabled and clear irq_happened from a
> soft-disabled state, thus potentially losing a pending event.
>=20
> Book3S "HV" seems to be keeping interrupts fully enabled all the way
> until the asm hard disables, which would be fine except that I'm =20
> worried
> we are racy vs. need_resched & signals.
>=20
> One thing you may be able to do is call prep_irq_for_idle(). This will
> tell you if something happened, giving you a chance to abort/re-enable
> before you go the guest.

As long as we go straight from IRQs fully enabled to hard-disabled, =20
before we check for signals and such, I don't think we need that (and =20
using it would raise the question of what to do on 32-bit).

What if we just take this patch, and add trace_hardirqs_on() just =20
before entering the guest?  This would be similar to what the 32-bit =20
non-KVM exception return code does (except it would be in C code).  =20
Perhaps we could set soft_enabled as well, but then we'd have to clear =20
it again before calling kvmppc_restart_interrupt() -- since the KVM =20
exception handlers don't actually care about soft_enabled (it would =20
just be for consistency), I'd rather just leave soft_enabled off.

We also don't want PACA_IRQ_HARD_DIS to be cleared the way =20
prep_irq_for_idle() does, because that's what lets the =20
local_irq_enable() do the hard-enabling after we exit the guest.

-Scott=

^ permalink raw reply

* Re: [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Benjamin Herrenschmidt @ 2013-05-07  0:03 UTC (permalink / raw)
  To: Scott Wood
  Cc: kvm, Alexander Graf, kvm-ppc, Mihai Caraman, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <1367884418.3398.10@snotra>

On Mon, 2013-05-06 at 18:53 -0500, Scott Wood wrote:
> 
> > Ie. The last stage of entry will hard enable, so they should be
> > soft-enabled too... if not, latency trackers will consider the whole
> > guest periods as "interrupt disabled"...
> 
> OK... I guess we already have that problem on 32-bit as well?

32-bit doesn't do lazy disable, so the situation is a lot easier there.

> > Now, kvmppc_lazy_ee_enable() seems to be clearly bogus to me. It will
> > unconditionally set soft_enabled and clear irq_happened from a
> > soft-disabled state, thus potentially losing a pending event.
> > 
> > Book3S "HV" seems to be keeping interrupts fully enabled all the way
> > until the asm hard disables, which would be fine except that I'm  
> > worried
> > we are racy vs. need_resched & signals.
> > 
> > One thing you may be able to do is call prep_irq_for_idle(). This will
> > tell you if something happened, giving you a chance to abort/re-enable
> > before you go the guest.
> 
> As long as we go straight from IRQs fully enabled to hard-disabled,  
> before we check for signals and such, I don't think we need that (and  
> using it would raise the question of what to do on 32-bit).

Except that you have to mark them as "soft enabled" before you enter the
guest with interrupts on...

But yes, I see your point. If interrupts are fully enabled and you call
hard_irq_disable(), there should be no chance for anything to mess
around with irq_happened.

However if you set soft-enabled later on before the rfid that returns to
the guest and sets EE, you *must* also clear PACA_IRQ_HARD_DIS in
irq_happened. If you get that out of sync bad things will happen later
on...

To be sure all is well, you might want to
WARN_ON(get_paca()->irq_happened == PACA_IRQ_HARD_DIS); (with a comment
explaining why so).

Another problem is that hard_irq_disable() doesn't call
trace_hardirqs_off()... We might want to fix that:

static inline void hard_irq_disable(void)
{
	__hard_irq_disable();
	if (get_paca()->soft_enabled)
		trace_hardirqs_off();
	get_paca()->soft_enabled = 0;
	get_paca()->irq_happened |= PACA_IRQ_HARD_DIS;
}

> What if we just take this patch, and add trace_hardirqs_on() just  
> before entering the guest?

You still want to set soft_enabled I'd say ... though I can see how you
may get away without it as long as you call trace_hardirqs_off() right
on the way back from the guest, but beware some lockdep bits will choke
if they ever spot the discrepancy between the traced irq state and
soft_enabled. I'd recommend you just keep it in sync.

>   This would be similar to what the 32-bit  
> non-KVM exception return code does (except it would be in C code).   
> Perhaps we could set soft_enabled as well, but then we'd have to clear  
> it again before calling kvmppc_restart_interrupt() -- since the KVM  
> exception handlers don't actually care about soft_enabled (it would  
> just be for consistency), I'd rather just leave soft_enabled off.
> 
> We also don't want PACA_IRQ_HARD_DIS to be cleared the way  
> prep_irq_for_idle() does, because that's what lets the  
> local_irq_enable() do the hard-enabling after we exit the guest.

Then set it again. Don't leave the kernel in a state where soft_enabled
is 1 and irq_happened is non-zero. It might work in the specific KVM
case we are looking at now because we know we are coming back via KVM
exit and putting things right again but it's fragile, somebody will come
back and break it, etc...

If necessary, create (or improve existing) helpers that do the right
state adjustement. The cost of a couple of byte stores is negligible,
I'd rather you make sure everything remains in sync at all times.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 2/5] KVM: PPC: iommu: Add missing kvm_iommu_map_pages/kvm_iommu_unmap_pages
From: Alexey Kardashevskiy @ 2013-05-07  0:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, Paul Mackerras,
	linux-pci, linuxppc-dev, David Gibson
In-Reply-To: <1367874436.2868.90.camel@ul30vt.home>

On 05/07/2013 07:07 AM, Alex Williamson wrote:
> On Mon, 2013-05-06 at 17:21 +1000, aik@ozlabs.ru wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> The IOMMU API implements groups creating/deletion, device binding
>> and IOMMU map/unmap operations.
>>
>> The PowerPC implementation uses most of the API except map/unmap
>> operations, which are implemented on POWER using hypercalls.
>>
>> However, in order to link a kernel with the CONFIG_IOMMU_API enabled,
>> the empty kvm_iommu_map_pages/kvm_iommu_unmap_pages have to be
>> defined, so this defines them.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Cc: David Gibson <david@gibson.dropbear.id.au>
>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>> ---
>>  arch/powerpc/include/asm/kvm_host.h |   14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index b6a047e..c025d91 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -603,4 +603,18 @@ struct kvm_vcpu_arch {
>>  
>>  #define __KVM_HAVE_ARCH_WQP
>>  
>> +#ifdef CONFIG_IOMMU_API
>> +/* POWERPC does not use IOMMU API for mapping/unmapping */
>> +static inline int kvm_iommu_map_pages(struct kvm *kvm,
>> +		struct kvm_memory_slot *slot)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
>> +		struct kvm_memory_slot *slot)
>> +{
>> +}
>> +#endif /* CONFIG_IOMMU_API */
>> +
>>  #endif /* __POWERPC_KVM_HOST_H__ */
> 
> This is no longer needed, Gleb applied my patch for 3.10 that make all
> of KVM device assignment dependent on a build config option and the top
> level kvm_host.h now includes this when that is not set.  Thanks,

Cannot find it, could you point me please where it is on github or
git.kernel.org? Thanks.


-- 
Alexey

^ permalink raw reply

* Re: [v1][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with E500MC
From: tiejun.chen @ 2013-05-07  1:33 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <5187C521.2070904@suse.de>

On 05/06/2013 10:58 PM, Alexander Graf wrote:
> On 05/06/2013 04:53 AM, Tiejun Chen wrote:
>> Actually E500MC also support doorbell exception, and CONFIG_PPC_E500MC
>> can cover BOOK3E/BOOK3E_64 as well.
>>
>> Signed-off-by: Tiejun Chen<tiejun.chen@windriver.com>
>> ---
>>   arch/powerpc/kvm/booke.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 1020119..dc1f590 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
>>           kvmppc_fill_pt_regs(&regs);
>>           timer_interrupt(&regs);
>>           break;
>> -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64)
>> +#if defined(CONFIG_PPC_E500MC)
>
> I suppose you mean CONFIG_KVM_E500MC here? Why didn't this work for you before?

This works for me.

Here I just mean currently CONFIG_PPC_E500MC is always selected no matter what 
CONFIG_PPC_FSL_BOOK3E or CONFIG_PPC_BOOK3E_64 is enabled. And especially, this 
is already in the arch/powerpc/kvm/booke.c file, so I think one #ifdef 
(CONFIG_PPC_E500MC) is enough and also makes sense.

> The ifdef above should cover the same range of CPUs.

Or furthermore, the #ifdef CONFIG_PPC_DOORBELL is reasonable to cover this.

Tiejun

^ permalink raw reply

* Invalid perf_branch_entry.to entries question
From: Michael Neuling @ 2013-05-07  1:35 UTC (permalink / raw)
  To: Peter Zijlstra, eranian; +Cc: Linux PPC dev, linux-kernel, Anshuman Khandual

Peter & Stephane,

We are plumbing the POWER8 Branch History Rolling Buffer (BHRB) into
struct perf_branch_entry.

Sometimes on POWER8 we may not be able to fill out the "to" address.  We
initially thought of just making this 0, but it's feasible that this
could be a valid address to branch to. 

The other logical value to indicate an invalid entry would be all 1s
which is not possible (on POWER at least).

Do you guys have a preference as to what we should use as an invalid
entry?  This would have some consequences for the userspace tool also.

The alternative would be to add a flag alongside mispred/predicted to
indicate the validity of the "to" address.

Mikey

^ permalink raw reply

* Re: [PATCH 2/5] KVM: PPC: iommu: Add missing kvm_iommu_map_pages/kvm_iommu_unmap_pages
From: Alex Williamson @ 2013-05-07  1:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, Paul Mackerras,
	linux-pci, linuxppc-dev, David Gibson
In-Reply-To: <51884FA8.6010400@ozlabs.ru>

On Tue, 2013-05-07 at 10:49 +1000, Alexey Kardashevskiy wrote:
> On 05/07/2013 07:07 AM, Alex Williamson wrote:
> > On Mon, 2013-05-06 at 17:21 +1000, aik@ozlabs.ru wrote:
> >> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>
> >> The IOMMU API implements groups creating/deletion, device binding
> >> and IOMMU map/unmap operations.
> >>
> >> The PowerPC implementation uses most of the API except map/unmap
> >> operations, which are implemented on POWER using hypercalls.
> >>
> >> However, in order to link a kernel with the CONFIG_IOMMU_API enabled,
> >> the empty kvm_iommu_map_pages/kvm_iommu_unmap_pages have to be
> >> defined, so this defines them.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> Cc: David Gibson <david@gibson.dropbear.id.au>
> >> Signed-off-by: Paul Mackerras <paulus@samba.org>
> >> ---
> >>  arch/powerpc/include/asm/kvm_host.h |   14 ++++++++++++++
> >>  1 file changed, 14 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> >> index b6a047e..c025d91 100644
> >> --- a/arch/powerpc/include/asm/kvm_host.h
> >> +++ b/arch/powerpc/include/asm/kvm_host.h
> >> @@ -603,4 +603,18 @@ struct kvm_vcpu_arch {
> >>  
> >>  #define __KVM_HAVE_ARCH_WQP
> >>  
> >> +#ifdef CONFIG_IOMMU_API
> >> +/* POWERPC does not use IOMMU API for mapping/unmapping */
> >> +static inline int kvm_iommu_map_pages(struct kvm *kvm,
> >> +		struct kvm_memory_slot *slot)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
> >> +		struct kvm_memory_slot *slot)
> >> +{
> >> +}
> >> +#endif /* CONFIG_IOMMU_API */
> >> +
> >>  #endif /* __POWERPC_KVM_HOST_H__ */
> > 
> > This is no longer needed, Gleb applied my patch for 3.10 that make all
> > of KVM device assignment dependent on a build config option and the top
> > level kvm_host.h now includes this when that is not set.  Thanks,
> 
> Cannot find it, could you point me please where it is on github or
> git.kernel.org? Thanks.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a5bab1004729f3302c776e53ee7c895b98bb1ce

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: tiejun.chen @ 2013-05-07  1:56 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1367884257.3398.9@snotra>

On 05/07/2013 07:50 AM, Scott Wood wrote:
> On 05/05/2013 10:13:17 PM, tiejun.chen wrote:
>> On 05/06/2013 11:10 AM, Tiejun Chen wrote:
>>> For the external interrupt, the decrementer exception and the doorbell
>>> excpetion, we also need to soft-disable interrupts while doing as host
>>> interrupt handlers since the DO_KVM hook is always performed to skip
>>> EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE).
>
> http://patchwork.ozlabs.org/patch/241344/
> http://patchwork.ozlabs.org/patch/241412/
>
> :-)

I'm observing the same behaviour as well:

	WARN_ON_ONCE(!irqs_disabled());

>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>>> ---
>>>   arch/powerpc/kvm/bookehv_interrupts.S |    9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S
>>> b/arch/powerpc/kvm/bookehv_interrupts.S
>>> index e8ed7d6..2fd62bf 100644
>>> --- a/arch/powerpc/kvm/bookehv_interrupts.S
>>> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
>>> @@ -33,6 +33,8 @@
>>>
>>>   #ifdef CONFIG_64BIT
>>>   #include <asm/exception-64e.h>
>>> +#include <asm/hw_irq.h>
>>> +#include <asm/irqflags.h>
>>>   #else
>>>   #include "../kernel/head_booke.h" /* for THREAD_NORMSAVE() */
>>>   #endif
>>> @@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host)
>>>       PPC_LL    r3, HOST_RUN(r1)
>>>       mr    r5, r14 /* intno */
>>>       mr    r14, r4 /* Save vcpu pointer. */
>>> +#ifdef CONFIG_64BIT
>>> +    /* Should we soft-disable interrupts? */
>>> +    andi.    r6, r5, BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER
>>> | BOOKE_INTERRUPT_DOORBELL
>>> +    beq    skip_soft_dis
>>> +    SOFT_DISABLE_INTS(r7,r8)
>>> +skip_soft_dis:
>>> +#endif
>
> Why wouldn't we always disable them?  kvmppc_handle_exit() will enable
> interrupts when it's ready.

This only disable soft interrupt for kvmppc_restart_interrupt() that restarts 
interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | 
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

b. bl      kvmppc_handle_exit

c. kvmppc_handle_exit()
{
         int r = RESUME_HOST;
         int s;

         /* update before a new last_exit_type is rewritten */
         kvmppc_update_timing_stats(vcpu);

         /* restart interrupts if they were meant for the host */
         kvmppc_restart_interrupt(vcpu, exit_nr);

         local_irq_enable();	==> Enable again.
....

And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)           \
         START_EXCEPTION(label);                                         \
         NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
         EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*)             \
	...

So I think this should be reasonable :)

Tiejun

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Scott Wood @ 2013-05-07  2:06 UTC (permalink / raw)
  To: tiejun.chen; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <51885F49.6060605@windriver.com>

On 05/06/2013 08:56:25 PM, tiejun.chen wrote:
> On 05/07/2013 07:50 AM, Scott Wood wrote:
>> On 05/05/2013 10:13:17 PM, tiejun.chen wrote:
>>> On 05/06/2013 11:10 AM, Tiejun Chen wrote:
>>>> For the external interrupt, the decrementer exception and the =20
>>>> doorbell
>>>> excpetion, we also need to soft-disable interrupts while doing as =20
>>>> host
>>>> interrupt handlers since the DO_KVM hook is always performed to =20
>>>> skip
>>>> EXCEPTION_COMMON then miss this original chance with the 'ints' =20
>>>> (INTS_DISABLE).
>>=20
>> http://patchwork.ozlabs.org/patch/241344/
>> http://patchwork.ozlabs.org/patch/241412/
>>=20
>> :-)
>=20
> I'm observing the same behaviour as well:
>=20
> 	WARN_ON_ONCE(!irqs_disabled());

So, could you explain the benefits of your approach over what's being =20
discussed in those threads?

>> Why wouldn't we always disable them?  kvmppc_handle_exit() will =20
>> enable
>> interrupts when it's ready.
>=20
> This only disable soft interrupt for kvmppc_restart_interrupt() that =20
> restarts interrupts if they were meant for the host:
>=20
> a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | =20
> BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

Those aren't the only exceptions that can end up going to the host.  We =20
could get a TLB miss that results in a heavyweight MMIO exit, etc.

And I'd rather see any fix for this problem stay out of the asm code.

> b. bl      kvmppc_handle_exit
>=20
> c. kvmppc_handle_exit()
> {
>         int r =3D RESUME_HOST;
>         int s;
>=20
>         /* update before a new last_exit_type is rewritten */
>         kvmppc_update_timing_stats(vcpu);
>=20
>         /* restart interrupts if they were meant for the host */
>         kvmppc_restart_interrupt(vcpu, exit_nr);
>=20
>         local_irq_enable();	=3D=3D> Enable again.
> ....
>=20
> And shouldn't we handle kvmppc_restart_interrupt() like the original =20
> HOST flow?
>=20
> #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, =20
> ack)           \
>         =20
> START_EXCEPTION(label);                                         \
>         NORMAL_EXCEPTION_PROLOG(trapnum, intnum, =20
> PROLOG_ADDITION_MASKABLE)\
>         EXCEPTION_COMMON(trapnum, PACA_EXGEN, =20
> *INTS_DISABLE*)             \
> 	...

Could you elaborate on what you mean?

-Scott=

^ permalink raw reply

* [PATCH] powerpc: Fix  MAX_STACK_TRACE_ENTRIES too low warning again
From: Li Zhong @ 2013-05-07  2:32 UTC (permalink / raw)
  To: PowerPC email list; +Cc: Paul Mackerras

Saw this warning again, and this time from the ret_from_fork path. 

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/entry_32.S |    2 --
 arch/powerpc/kernel/entry_64.S |    2 --
 arch/powerpc/kernel/process.c  |    1 +
 3 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e514de5..d22e73e 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -439,8 +439,6 @@ ret_from_fork:
 ret_from_kernel_thread:
 	REST_NVGPRS(r1)
 	bl	schedule_tail
-	li	r3,0
-	stw	r3,0(r1)
 	mtlr	r14
 	mr	r3,r15
 	PPC440EP_ERR42
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fe5259..48e8a86 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -376,8 +376,6 @@ _GLOBAL(ret_from_fork)
 _GLOBAL(ret_from_kernel_thread)
 	bl	.schedule_tail
 	REST_NVGPRS(r1)
-	li	r3,0
-	std	r3,0(r1)
 	ld	r14, 0(r14)
 	mtlr	r14
 	mr	r3,r15
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ceb4e7b..2c9fc5e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -973,6 +973,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	 */
 	sp -= sizeof(struct pt_regs);
 	kregs = (struct pt_regs *) sp;
+	kregs->gpr[1] = 0;
 	sp -= STACK_FRAME_OVERHEAD;
 	p->thread.ksp = sp;
 	p->thread.ksp_limit = (unsigned long)task_stack_page(p) +
-- 
1.7.1

^ permalink raw reply related

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: tiejun.chen @ 2013-05-07  2:43 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1367892390.3398.12@snotra>

On 05/07/2013 10:06 AM, Scott Wood wrote:
> On 05/06/2013 08:56:25 PM, tiejun.chen wrote:
>> On 05/07/2013 07:50 AM, Scott Wood wrote:
>>> On 05/05/2013 10:13:17 PM, tiejun.chen wrote:
>>>> On 05/06/2013 11:10 AM, Tiejun Chen wrote:
>>>>> For the external interrupt, the decrementer exception and the doorbell
>>>>> excpetion, we also need to soft-disable interrupts while doing as host
>>>>> interrupt handlers since the DO_KVM hook is always performed to skip
>>>>> EXCEPTION_COMMON then miss this original chance with the 'ints'
>>>>> (INTS_DISABLE).
>>>
>>> http://patchwork.ozlabs.org/patch/241344/
>>> http://patchwork.ozlabs.org/patch/241412/
>>>
>>> :-)
>>
>> I'm observing the same behaviour as well:
>>
>>     WARN_ON_ONCE(!irqs_disabled());
>
> So, could you explain the benefits of your approach over what's being discussed
> in those threads?

They're a long thread so I think I need to take time to see :)

>
>>> Why wouldn't we always disable them?  kvmppc_handle_exit() will enable
>>> interrupts when it's ready.
>>
>> This only disable soft interrupt for kvmppc_restart_interrupt() that restarts
>> interrupts if they were meant for the host:
>>
>> a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
>> BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL
>
> Those aren't the only exceptions that can end up going to the host.  We could
> get a TLB miss that results in a heavyweight MMIO exit, etc.

This is like host handler, so I'm just disabling soft interrupt during 
kvmppc_restart_interrupt() for Doorbell interrupt/Decrementer Interrupt/External 
Input Interrupt.

I don't see anything should be disabled for any TLB exception in host handler.

>
> And I'd rather see any fix for this problem stay out of the asm code.

We already have an appropriate SOFT_DISABLE_INTS so I think we can take this 
easily :)

>
>> b. bl      kvmppc_handle_exit
>>
>> c. kvmppc_handle_exit()
>> {
>>         int r = RESUME_HOST;
>>         int s;
>>
>>         /* update before a new last_exit_type is rewritten */
>>         kvmppc_update_timing_stats(vcpu);
>>
>>         /* restart interrupts if they were meant for the host */
>>         kvmppc_restart_interrupt(vcpu, exit_nr);
>>
>>         local_irq_enable();    ==> Enable again.
>> ....
>>
>> And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow?
>>
>> #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)           \
>>         START_EXCEPTION(label);                                         \
>>         NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
>>         EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*)             \
>>     ...
>
> Could you elaborate on what you mean?

In host handler, we always use MASKABLE_EXCEPTION() to define-to-handle some 
exceptions: Doorbell interrupt/Decrementer Interrupt/External Input Interrupt:

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack)           \
         START_EXCEPTION(label);                                         \
         NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\
         EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*)             \

This would call INTS_DISABLE, which is equal to SOFT_DISABLE_INTS(), to disable 
soft interrupt before call all associated handlers: 
do_IRQ()/timer_interrupt()/doorbell_exception().

But DO_KVM hook always skips INTS_DISABLE.

So I think we also need to do INTS_DISABLE for kvmppc_restart_interrupt() since 
actually that restarts interrupts for the host with a similar way as they are 
called by host.

Tiejun

^ permalink raw reply

* Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
From: Scott Wood @ 2013-05-07  3:04 UTC (permalink / raw)
  To: tiejun.chen; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <51886A59.80807@windriver.com>

On 05/06/2013 09:43:37 PM, tiejun.chen wrote:
> On 05/07/2013 10:06 AM, Scott Wood wrote:
>> On 05/06/2013 08:56:25 PM, tiejun.chen wrote:
>>> On 05/07/2013 07:50 AM, Scott Wood wrote:
>>>> On 05/05/2013 10:13:17 PM, tiejun.chen wrote:
>>>>> On 05/06/2013 11:10 AM, Tiejun Chen wrote:
>>>>>> For the external interrupt, the decrementer exception and the =20
>>>>>> doorbell
>>>>>> excpetion, we also need to soft-disable interrupts while doing =20
>>>>>> as host
>>>>>> interrupt handlers since the DO_KVM hook is always performed to =20
>>>>>> skip
>>>>>> EXCEPTION_COMMON then miss this original chance with the 'ints'
>>>>>> (INTS_DISABLE).
>>>>=20
>>>> http://patchwork.ozlabs.org/patch/241344/
>>>> http://patchwork.ozlabs.org/patch/241412/
>>>>=20
>>>> :-)
>>>=20
>>> I'm observing the same behaviour as well:
>>>=20
>>>     WARN_ON_ONCE(!irqs_disabled());
>>=20
>> So, could you explain the benefits of your approach over what's =20
>> being discussed
>> in those threads?
>=20
> They're a long thread so I think I need to take time to see :)
>=20
>>=20
>>>> Why wouldn't we always disable them?  kvmppc_handle_exit() will =20
>>>> enable
>>>> interrupts when it's ready.
>>>=20
>>> This only disable soft interrupt for kvmppc_restart_interrupt() =20
>>> that restarts
>>> interrupts if they were meant for the host:
>>>=20
>>> a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
>>> BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL
>>=20
>> Those aren't the only exceptions that can end up going to the host.  =20
>> We could
>> get a TLB miss that results in a heavyweight MMIO exit, etc.
>=20
> This is like host handler, so I'm just disabling soft interrupt =20
> during kvmppc_restart_interrupt() for Doorbell interrupt/Decrementer =20
> Interrupt/External Input Interrupt.
>=20
> I don't see anything should be disabled for any TLB exception in host =20
> handler.

Every exception type needs consistent lazy EE state once we hit the =20
local_irq_enable() (or any other C code that cares).

Plus, if you're going to add code to make something conditional, you =20
should have a good reason for making it conditional.  Being more like =20
the 64-bit host handler for its own sake isn't good enough, especially =20
if you introduce differences between 32 and 64 bit in the process.

>> And I'd rather see any fix for this problem stay out of the asm code.
>=20
> We already have an appropriate SOFT_DISABLE_INTS so I think we can =20
> take this easily :)

An unconditional call to hard_irq_disable() at the beginning of =20
kvmppc_handle_exit() should suffice.  I already have this in the next =20
version of my patch that I'll be posting shortly.

Note that we need this on 32-bit as well, so that trace_hardirqs_off() =20
gets called.

-Scott=

^ permalink raw reply

* Re: [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Scott Wood @ 2013-05-07  3:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Alexander Graf, kvm-ppc, Mihai Caraman, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <1367884994.29496.28.camel@pasglop>

On 05/06/2013 07:03:14 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-06 at 18:53 -0500, Scott Wood wrote:
> >
> > > Ie. The last stage of entry will hard enable, so they should be
> > > soft-enabled too... if not, latency trackers will consider the =20
> whole
> > > guest periods as "interrupt disabled"...
> >
> > OK... I guess we already have that problem on 32-bit as well?
>=20
> 32-bit doesn't do lazy disable, so the situation is a lot easier =20
> there.

Right, but it still currently enters the guest with interrupts marked =20
as disabled, so we'd have the same latency tracker issue.

> Another problem is that hard_irq_disable() doesn't call
> trace_hardirqs_off()... We might want to fix that:
>=20
> static inline void hard_irq_disable(void)
> {
> 	__hard_irq_disable();
> 	if (get_paca()->soft_enabled)
> 		trace_hardirqs_off();
> 	get_paca()->soft_enabled =3D 0;
> 	get_paca()->irq_happened |=3D PACA_IRQ_HARD_DIS;
> }

Is it possible there are places that assume the current behavior?

> > We also don't want PACA_IRQ_HARD_DIS to be cleared the way
> > prep_irq_for_idle() does, because that's what lets the
> > local_irq_enable() do the hard-enabling after we exit the guest.
>=20
> Then set it again. Don't leave the kernel in a state where =20
> soft_enabled
> is 1 and irq_happened is non-zero. It might work in the specific KVM
> case we are looking at now because we know we are coming back via KVM
> exit and putting things right again but it's fragile, somebody will =20
> come
> back and break it, etc...

KVM is a pretty special case -- at least on booke, it's required that =20
all exits from guest state go through the KVM exception code.  I think =20
it's less likely that that changes, than something breaks in the code =20
to fix up lazy ee state (especially since we've already seen the latter =20
happen).

I'll give it a shot, though.

> If necessary, create (or improve existing) helpers that do the right
> state adjustement. The cost of a couple of byte stores is negligible,
> I'd rather you make sure everything remains in sync at all times.

My concern was mainly about complexity -- it seemed simpler to just say =20
that the during guest execution, CPU is in a special state that is not =20
visible to anything that cares about lazy EE.  The fact that EE can =20
actually be *off* and we still take the interrupt supports its =20
specialness. :-)

-Scott=

^ permalink raw reply

* Re: [PATCH 2/5] KVM: PPC: iommu: Add missing kvm_iommu_map_pages/kvm_iommu_unmap_pages
From: Alexey Kardashevskiy @ 2013-05-07  3:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Alexander Graf, kvm-ppc, linux-kernel, Paul Mackerras,
	linux-pci, linuxppc-dev, David Gibson
In-Reply-To: <1367890979.2868.93.camel@ul30vt.home>

On 05/07/2013 11:42 AM, Alex Williamson wrote:
> On Tue, 2013-05-07 at 10:49 +1000, Alexey Kardashevskiy wrote:
>> On 05/07/2013 07:07 AM, Alex Williamson wrote:
>>> On Mon, 2013-05-06 at 17:21 +1000, aik@ozlabs.ru wrote:
>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>
>>>> The IOMMU API implements groups creating/deletion, device binding
>>>> and IOMMU map/unmap operations.
>>>>
>>>> The PowerPC implementation uses most of the API except map/unmap
>>>> operations, which are implemented on POWER using hypercalls.
>>>>
>>>> However, in order to link a kernel with the CONFIG_IOMMU_API enabled,
>>>> the empty kvm_iommu_map_pages/kvm_iommu_unmap_pages have to be
>>>> defined, so this defines them.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> Cc: David Gibson <david@gibson.dropbear.id.au>
>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>> ---
>>>>  arch/powerpc/include/asm/kvm_host.h |   14 ++++++++++++++
>>>>  1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>>>> index b6a047e..c025d91 100644
>>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>>> @@ -603,4 +603,18 @@ struct kvm_vcpu_arch {
>>>>  
>>>>  #define __KVM_HAVE_ARCH_WQP
>>>>  
>>>> +#ifdef CONFIG_IOMMU_API
>>>> +/* POWERPC does not use IOMMU API for mapping/unmapping */
>>>> +static inline int kvm_iommu_map_pages(struct kvm *kvm,
>>>> +		struct kvm_memory_slot *slot)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static inline void kvm_iommu_unmap_pages(struct kvm *kvm,
>>>> +		struct kvm_memory_slot *slot)
>>>> +{
>>>> +}
>>>> +#endif /* CONFIG_IOMMU_API */
>>>> +
>>>>  #endif /* __POWERPC_KVM_HOST_H__ */
>>>
>>> This is no longer needed, Gleb applied my patch for 3.10 that make all
>>> of KVM device assignment dependent on a build config option and the top
>>> level kvm_host.h now includes this when that is not set.  Thanks,
>>
>> Cannot find it, could you point me please where it is on github or
>> git.kernel.org? Thanks.
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a5bab1004729f3302c776e53ee7c895b98bb1ce


Yes, I confirm, this is patch is not need any more. Thanks!



-- 
Alexey

^ permalink raw reply

* [PATCH] powerpc: fix numa distance for form0 device tree
From: Michael Ellerman @ 2013-05-07  3:49 UTC (permalink / raw)
  To: stable; +Cc: linuxppc-dev, Anton Blanchard

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.

The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f24124df89e38067b3766b7bef06ddb
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same.  (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/mm/numa.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index bba87ca..6a252c4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -201,7 +201,7 @@ int __node_distance(int a, int b)
 	int distance = LOCAL_DISTANCE;
 
 	if (!form1_affinity)
-		return distance;
+		return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
 
 	for (i = 0; i < distance_ref_points_depth; i++) {
 		if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Benjamin Herrenschmidt @ 2013-05-07  3:53 UTC (permalink / raw)
  To: Scott Wood
  Cc: kvm, Alexander Graf, kvm-ppc, Mihai Caraman, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <1367895926.3398.14@snotra>

On Mon, 2013-05-06 at 22:05 -0500, Scott Wood wrote:
> On 05/06/2013 07:03:14 PM, Benjamin Herrenschmidt wrote:
> > On Mon, 2013-05-06 at 18:53 -0500, Scott Wood wrote:
> > >
> > > > Ie. The last stage of entry will hard enable, so they should be
> > > > soft-enabled too... if not, latency trackers will consider the  
> > whole
> > > > guest periods as "interrupt disabled"...
> > >
> > > OK... I guess we already have that problem on 32-bit as well?
> > 
> > 32-bit doesn't do lazy disable, so the situation is a lot easier  
> > there.
> 
> Right, but it still currently enters the guest with interrupts marked  
> as disabled, so we'd have the same latency tracker issue.
> 
> > Another problem is that hard_irq_disable() doesn't call
> > trace_hardirqs_off()... We might want to fix that:
> > 
> > static inline void hard_irq_disable(void)
> > {
> > 	__hard_irq_disable();
> > 	if (get_paca()->soft_enabled)
> > 		trace_hardirqs_off();
> > 	get_paca()->soft_enabled = 0;
> > 	get_paca()->irq_happened |= PACA_IRQ_HARD_DIS;
> > }
> 
> Is it possible there are places that assume the current behavior?

There aren't many callers, I think this should be safe. Most
callers call it with interrupts already soft disabled, so that
should be a nop in these cases (idle for example).

But I can give it a quick spin today on a machine or two.

> > > We also don't want PACA_IRQ_HARD_DIS to be cleared the way
> > > prep_irq_for_idle() does, because that's what lets the
> > > local_irq_enable() do the hard-enabling after we exit the guest.
> > 
> > Then set it again. Don't leave the kernel in a state where  
> > soft_enabled
> > is 1 and irq_happened is non-zero. It might work in the specific KVM
> > case we are looking at now because we know we are coming back via KVM
> > exit and putting things right again but it's fragile, somebody will  
> > come
> > back and break it, etc...
> 
> KVM is a pretty special case -- at least on booke, it's required that  
> all exits from guest state go through the KVM exception code.  I think  
> it's less likely that that changes, than something breaks in the code  
> to fix up lazy ee state (especially since we've already seen the latter  
> happen).
> 
> I'll give it a shot, though.
> 
> > If necessary, create (or improve existing) helpers that do the right
> > state adjustement. The cost of a couple of byte stores is negligible,
> > I'd rather you make sure everything remains in sync at all times.
> 
> My concern was mainly about complexity -- it seemed simpler to just say  
> that the during guest execution, CPU is in a special state that is not  
> visible to anything that cares about lazy EE.  The fact that EE can  
> actually be *off* and we still take the interrupt supports its  
> specialness. :-)

Yeah ... sort of :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: fix numa distance for form0 device tree
From: Greg KH @ 2013-05-07  4:06 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Anton Blanchard, stable
In-Reply-To: <1367898574-20594-1-git-send-email-michael@ellerman.id.au>

On Tue, May 07, 2013 at 01:49:34PM +1000, Michael Ellerman wrote:
> From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> 
> Commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.
> 
> The following commit breaks numa distance setup for old powerpc
> systems that use form0 encoding in device tree.
> 
> commit 41eab6f88f24124df89e38067b3766b7bef06ddb
> powerpc/numa: Use form 1 affinity to setup node distance
> 
> Device tree node /rtas/ibm,associativity-reference-points would
> index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
> form1 encoding detected by ibm,architecture-vec-5 property.
> 
> All modern systems use form1 and current kernel code is correct.
> However, on older systems with form0 encoding, the numa distance
> will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
> task scheduling anomaly since scheduler will skip building numa
> level domain (topmost domain with all cpus) if all numa distances
> are same.  (value of 'level' in sched_init_numa() will remain 0)
> 
> Prior to the above commit:
> ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
> 
> Restoring compatible behavior with this patch for old powerpc systems
> with device tree where numa distance are encoded as form0.
> 
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
> ---
>  arch/powerpc/mm/numa.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

What stable tree should this be applied to?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] powerpc: fix numa distance for form0 device tree
From: Michael Ellerman @ 2013-05-07  4:23 UTC (permalink / raw)
  To: Greg KH; +Cc: linuxppc-dev, Anton Blanchard, stable
In-Reply-To: <20130507040614.GA19525@kroah.com>

On Mon, May 06, 2013 at 09:06:15PM -0700, Greg KH wrote:
> On Tue, May 07, 2013 at 01:49:34PM +1000, Michael Ellerman wrote:
> > From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> > 
> > Commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.
> > 
> > The following commit breaks numa distance setup for old powerpc
> > systems that use form0 encoding in device tree.
> > 
> > commit 41eab6f88f24124df89e38067b3766b7bef06ddb
> > powerpc/numa: Use form 1 affinity to setup node distance
> > 
> > Device tree node /rtas/ibm,associativity-reference-points would
> > index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
> > form1 encoding detected by ibm,architecture-vec-5 property.
> > 
> > All modern systems use form1 and current kernel code is correct.
> > However, on older systems with form0 encoding, the numa distance
> > will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
> > task scheduling anomaly since scheduler will skip building numa
> > level domain (topmost domain with all cpus) if all numa distances
> > are same.  (value of 'level' in sched_init_numa() will remain 0)
> > 
> > Prior to the above commit:
> > ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
> > 
> > Restoring compatible behavior with this patch for old powerpc systems
> > with device tree where numa distance are encoded as form0.
> > 
> > Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> > Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
> > ---
> >  arch/powerpc/mm/numa.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> What stable tree should this be applied to?

All of them please.

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox