LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH -V7 01/18] mm/THP: HPAGE_SHIFT is not a #define on some arch
From: Aneesh Kumar K.V @ 2013-05-03 18:51 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, paulus, linux-mm
In-Reply-To: <20130430050101.GY20202@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Tue, Apr 30, 2013 at 09:12:09AM +0530, Aneesh Kumar K.V wrote:
>> David Gibson <dwg@au1.ibm.com> writes:
>> 
>> > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
>> >> and other derived values like HPAGE_PMD_ORDER are not constants. So move
>> >> that to hugepage_init
>> >
>> > These seems to miss the point.  Those variables may be defined in
>> > terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
>> > The transparent hugepage mechanism only works if the hugepage size is
>> > equal to the PMD size - and PMD_SHIFT remains a compile time constant.
>> >
>> > There's no reason having transparent hugepage should force the PMD
>> > size of hugepage to be the default for other purposes - it should be
>> > possible to do THP as long as PMD-sized is a possible hugepage size.
>> >
>> 
>> THP code does
>> 
>> #define HPAGE_PMD_SHIFT HPAGE_SHIFT
>> #define HPAGE_PMD_MASK HPAGE_MASK
>> #define HPAGE_PMD_SIZE HPAGE_SIZE
>> 
>> I had two options, one to move all those in terms of PMD_SHIFT
>
> This is a much better option that you've taken now, and really
> shouldn't be that hard.  The THP code is much more strongly tied to
> the fact that it is a PMD than the fact that it's the same size as
> explicit huge pages.
>

Ok I tried the above and that turned out to be much simpler. I will have
to make sure i didn't break other archs that support THP.

-aneesh

^ permalink raw reply

* Re: [PATCH -V7 02/10] powerpc/THP: Implement transparent hugepages for ppc64
From: Aneesh Kumar K.V @ 2013-05-03 18:54 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: linux-mm, linuxppc-dev, paulus
In-Reply-To: <20130503115428.GW13041@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Fri, May 03, 2013 at 06:19:03PM +1000, Benjamin Herrenschmidt wrote:
>> On Fri, 2013-05-03 at 14:52 +1000, David Gibson wrote:
>> > Here, specifically, the fact that PAGE_BUSY is in PAGE_THP_HPTEFLAGS
>> > is likely to be bad.  If the page is busy, it's in the middle of
>> > update so can't stably be considered the same as anything.
>> 
>> _PAGE_BUSY is more like a read lock. It means it's being hashed, so what
>> is not stable is _PAGE_HASHPTE, slot index, _ACCESSED and _DIRTY. The
>> rest is stable and usually is what pmd_same looks at (though I have a
>> small doubt vs. _ACCESSED and _DIRTY but at least x86 doesn't care since
>> they are updated by HW).
>
> Ok.  It still seems very odd to me that _PAGE_BUSY would be in the THP
> version of _PAGE_HASHPTE, but not the normal one.
>

64-4k definition:
/* PTE flags to conserve for HPTE identification */
#define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | \
			 _PAGE_SECONDARY | _PAGE_GROUP_IX)

64-64K definition:
/* PTE flags to conserve for HPTE identification */
#define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_COMBO)

BTW I have dropped that change in my current patch. I dropped the
usage of _PAGE_COMBO and instead started using _PAGE_4K_PFN for
identifying THP.That enabled me to use _PAGE_HPTEFLAGS as it is.

-aneesh

^ permalink raw reply

* Re: [PATCH -V7 04/10] powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages
From: Aneesh Kumar K.V @ 2013-05-03 18:58 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130503045323.GP13041@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Mon, Apr 29, 2013 at 01:21:45AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> What's the difference in meaning between pmd_huge() and pmd_large()?
>

#ifndef CONFIG_HUGETLB_PAGE
#define pmd_huge(x)	0
#endif

Also pmd_large do check for THP PTE flag, and _PAGE_PRESENT.

-aneesh

^ permalink raw reply

* Re: [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate
From: Aneesh Kumar K.V @ 2013-05-03 19:05 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130503052846.GU13041@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Mon, Apr 29, 2013 at 01:21:50AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> Hugepage invalidate involves invalidating multiple hpte entries.
>> Optimize the operation using H_BULK_REMOVE on lpar platforms.
>> On native, reduce the number of tlb flush.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Since this is purely an optimization, have you tried reproducing the
> bugs you're chasing with this patch not included?

That was due to not handling thp split while walking page table. I have
that fixed. Will post the next version soon.

>
>> ---
>>  arch/powerpc/include/asm/machdep.h    |   3 +
>>  arch/powerpc/mm/hash_native_64.c      |  78 +++++++++++++++++++++
>>  arch/powerpc/mm/pgtable_64.c          |  13 +++-
>>  arch/powerpc/platforms/pseries/lpar.c | 126 ++++++++++++++++++++++++++++++++--
>>  4 files changed, 210 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>> index 3f3f691..5d1e7d2 100644
>> --- a/arch/powerpc/include/asm/machdep.h
>> +++ b/arch/powerpc/include/asm/machdep.h
>> @@ -56,6 +56,9 @@ struct machdep_calls {

.....

>>  
>> +/*
>> + * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
>> + * to make sure that we avoid bouncing the hypervisor tlbie lock.
>> + */
>> +#define PPC64_HUGE_HPTE_BATCH 12
>> +
>> +static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>> +					     unsigned long *vpn, int count,
>> +					     int psize, int ssize)
>> +{
>> +	unsigned long param[9];
>
> [9]?  I only see 8 elements being used.

cut paste error from pSeries_lpar_flush_hash_range

>
>> +	int i = 0, pix = 0, rc;
>> +	unsigned long flags = 0;
>> +	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>> +
>> +	if (lock_tlbie)
>> +		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>
> Why are these hash operations being called with the tlbie lock held?

if the firmware doesn't support lockless TLBIE, we need to do locking
at the guest side. pSeries_lpar_flush_hash_range does that.

>
>> +
>> +	for (i = 0; i < count; i++) {
>> +
>> +		if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
>> +			pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize,
>> +						     ssize, 0);
>
> Couldn't you set the ppc_md hook based on the firmware request to
> avoid this test in the inner loop?  I don't see any tlbie operations
> at all.

didn't get that.

>
>> +		} else {
>> +			param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
>> +			param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
>> +			pix += 2;
>> +			if (pix == 8) {
>> +				rc = plpar_hcall9(H_BULK_REMOVE, param,
>> +						  param[0], param[1], param[2],
>> +						  param[3], param[4], param[5],
>> +						  param[6], param[7]);
>> +				BUG_ON(rc != H_SUCCESS);
>> +				pix = 0;
>> +			}
>> +		}
>> +	}
>> +	if (pix) {
>> +		param[pix] = HBR_END;
>> +		rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
>> +				  param[2], param[3], param[4], param[5],
>> +				  param[6], param[7]);
>> +		BUG_ON(rc != H_SUCCESS);
>> +	}
>> +
>> +	if (lock_tlbie)
>> +		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
>> +}
>> +
>> +static void pSeries_lpar_hugepage_invalidate(struct mm_struct *mm,
>> +				       unsigned char *hpte_slot_array,
>> +				       unsigned long addr, int psize)
>> +{
>> +	int ssize = 0, i, index = 0;
>> +	unsigned long s_addr = addr;
>> +	unsigned int max_hpte_count, valid;
>> +	unsigned long vpn_array[PPC64_HUGE_HPTE_BATCH];
>> +	unsigned long slot_array[PPC64_HUGE_HPTE_BATCH];
>> +	unsigned long shift, hidx, vpn = 0, vsid, hash, slot;
>> +
>> +	shift = mmu_psize_defs[psize].shift;
>> +	max_hpte_count = HUGE_PAGE_SIZE >> shift;
>> +
>> +	for (i = 0; i < max_hpte_count; i++) {
>> +		/*
>> +		 * 8 bits per each hpte entries
>> +		 * 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
>> +		 */
>> +		valid = hpte_slot_array[i] & 0x1;
>> +		if (!valid)
>> +			continue;
>> +		hidx =  hpte_slot_array[i]  >> 1;
>> +
>> +		/* get the vpn */
>> +		addr = s_addr + (i * (1ul << shift));
>> +		if (!is_kernel_addr(addr)) {
>> +			ssize = user_segment_size(addr);
>> +			vsid = get_vsid(mm->context.id, addr, ssize);
>> +			WARN_ON(vsid == 0);
>> +		} else {
>> +			vsid = get_kernel_vsid(addr, mmu_kernel_ssize);
>> +			ssize = mmu_kernel_ssize;
>> +		}
>> +
>> +		vpn = hpt_vpn(addr, vsid, ssize);
>> +		hash = hpt_hash(vpn, shift, ssize);
>> +		if (hidx & _PTEIDX_SECONDARY)
>> +			hash = ~hash;
>> +
>> +		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
>> +		slot += hidx & _PTEIDX_GROUP_IX;
>> +
>> +		slot_array[index] = slot;
>> +		vpn_array[index] = vpn;
>> +		if (index == PPC64_HUGE_HPTE_BATCH - 1) {
>> +			/*
>> +			 * Now do a bluk invalidate
>> +			 */
>> +			__pSeries_lpar_hugepage_invalidate(slot_array,
>> +							   vpn_array,
>> +							   PPC64_HUGE_HPTE_BATCH,
>> +							   psize, ssize);
>
> I don't really understand why you have one loop in this function, then
> another in the __ function.

?? if we didn't accumulate batch size number of entries, we won't call
the above. Hence we will have to do the bulk remove outside the if
loop. 


>
>> +			index = 0;
>> +		} else
>> +			index++;
>> +	}
>> +	if (index)
>> +		__pSeries_lpar_hugepage_invalidate(slot_array, vpn_array,
>> +						   index, psize, ssize);
>> +}
>> +
>>  static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>>  					   int psize, int ssize)
>>  {
>> @@ -360,13 +478,6 @@ static void pSeries_lpar_hpte_removebolted(unsigned long ea,
>>  	pSeries_lpar_hpte_invalidate(slot, vpn, psize, ssize, 0);
>>  }
>>  
>> -/* Flag bits for H_BULK_REMOVE */
>> -#define HBR_REQUEST	0x4000000000000000UL
>> -#define HBR_RESPONSE	0x8000000000000000UL
>> -#define HBR_END		0xc000000000000000UL
>> -#define HBR_AVPN	0x0200000000000000UL
>> -#define HBR_ANDCOND	0x0100000000000000UL
>> -
>>  /*
>>   * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>>   * lock.
>> @@ -452,6 +563,7 @@ void __init hpte_init_lpar(void)
>>  	ppc_md.hpte_removebolted = pSeries_lpar_hpte_removebolted;
>>  	ppc_md.flush_hash_range	= pSeries_lpar_flush_hash_range;
>>  	ppc_md.hpte_clear_all   = pSeries_lpar_hptab_clear;
>> +	ppc_md.hugepage_invalidate = pSeries_lpar_hugepage_invalidate;
>>  }
>>  
>>  #ifdef CONFIG_PPC_SMLPAR
>
> -- 

-aneesh

^ permalink raw reply

* Re: [PATCH -V7 10/10] powerpc: disable assert_pte_locked
From: Aneesh Kumar K.V @ 2013-05-03 19:07 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, linuxppc-dev, linux-mm
In-Reply-To: <20130503053027.GV13041@truffula.fritz.box>

David Gibson <dwg@au1.ibm.com> writes:

> On Mon, Apr 29, 2013 at 01:21:51AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> With THP we set pmd to none, before we do pte_clear. Hence we can't
>> walk page table to get the pte lock ptr and verify whether it is locked.
>> THP do take pte lock before calling pte_clear. So we don't change the locking
>> rules here. It is that we can't use page table walking to check whether
>> pte locks are help with THP.
>> 
>> NOTE: This needs to be re-written. Not to be merged upstream.
>
> So, rewrite it..


That is something we need to discuss more. We can't do the pte_locked
assert the way we do now. Because as explained above, thp collapse
depend on setting pmd to none before doing pte_clear. So we clearly
cannot walk the page table and fine the ptl to check whether we are
holding that lock. But yes, these asserts are valid. Those function
should be called holding ptl locks. I still haven't found an alternative
way to do those asserts. Any suggestions ?


>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/pgtable.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index 214130a..d77f94f 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -224,6 +224,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>>  #ifdef CONFIG_DEBUG_VM
>>  void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
>>  {
>> +#if 0
>>  	pgd_t *pgd;
>>  	pud_t *pud;
>>  	pmd_t *pmd;
>> @@ -237,6 +238,7 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
>>  	pmd = pmd_offset(pud, addr);
>>  	BUG_ON(!pmd_present(*pmd));
>>  	assert_spin_locked(pte_lockptr(mm, pmd));
>> +#endif
>>  }
>>  #endif /* CONFIG_DEBUG_VM */
>>  
>

-aneesh

^ permalink raw reply

* RE: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Caraman Mihai Claudiu-B02008 @ 2013-05-03 20:01 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <1367604287.19391.2@snotra>

> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Friday, May 03, 2013 9:05 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org; Caraman Mihai Claudiu-B02008
> Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
>=20
> > The unresponsiveness has to do with the fact that
> > arch_local_irq_restore()
> > does not guarantees to hard enable interrupts.
>=20
> Could you elaborate?  If the saved IRQ state was "enabled", why
> wouldn't arch_local_irq_restore() hard-enable IRQs?  The last thing it
> does is __hard_irq_enable().

	if (!irq_happened)
		return;

>=20
> Where is the arch_local_irq_restore() instance you're talking about?

./arch/power/kernel/irq.c

>=20
> > To do so replace exception
> > function calls like timer_interrupt() with irq_happened flags. The
> > local_irq_enable() call takes care of replaying them and lets the
> > interrupts
> > hard enabled.
>=20
> Not sure what you mean by "lets the interrupts hard enabled"... Do you
> mean the EE bit in regs->msr, as opposed to the EE bit in the current
> MSR?

If irq_happened "the last thing it does is __hard_irq_enable()".

> > @@ -789,16 +788,16 @@ static void kvmppc_restart_interrupt(struct
> > kvm_vcpu *vcpu,
> >  	switch (exit_nr) {
> >  	case BOOKE_INTERRUPT_EXTERNAL:
> >  		kvmppc_fill_pt_regs(&regs);
> > -		do_IRQ(&regs);
> > +		local_paca->irq_happened |=3D PACA_IRQ_EE;
> >  		break;
>=20
> Aren't you breaking 32-bit here?

I had eyes only for 64-bit hangs :)

-Mike

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Scott Wood @ 2013-05-03 20:15 UTC (permalink / raw)
  To: Caraman Mihai Claudiu-B02008
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF3E984C@039-SN2MPN1-013.039d.mgd.msft.net>

On 05/03/2013 03:01:26 PM, Caraman Mihai Claudiu-B02008 wrote:
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Friday, May 03, 2013 9:05 PM
> > To: Caraman Mihai Claudiu-B02008
> > Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> > dev@lists.ozlabs.org; Caraman Mihai Claudiu-B02008
> > Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and =20
> hangs
> >
> > > The unresponsiveness has to do with the fact that
> > > arch_local_irq_restore()
> > > does not guarantees to hard enable interrupts.
> >
> > Could you elaborate?  If the saved IRQ state was "enabled", why
> > wouldn't arch_local_irq_restore() hard-enable IRQs?  The last thing =20
> it
> > does is __hard_irq_enable().
>=20
> 	if (!irq_happened)
> 		return;

OK, so the problem is that we're not setting PACA_IRQ_HARD_DIS when we =20
hard-disable interrupts?

> > Where is the arch_local_irq_restore() instance you're talking about?
>=20
> ./arch/power/kernel/irq.c

I meant the caller. :-P

-Scott=

^ permalink raw reply

* RE: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Caraman Mihai Claudiu-B02008 @ 2013-05-03 20:56 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <1367612101.19391.8@snotra>

> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Friday, May 03, 2013 11:15 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: Wood Scott-B07421; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
>=20
> > > > The unresponsiveness has to do with the fact that
> > > > arch_local_irq_restore()
> > > > does not guarantees to hard enable interrupts.
> > >
> > > Could you elaborate?  If the saved IRQ state was "enabled", why
> > > wouldn't arch_local_irq_restore() hard-enable IRQs?  The last thing
> > it
> > > does is __hard_irq_enable().
> >
> > 	if (!irq_happened)
> > 		return;
>=20
> OK, so the problem is that we're not setting PACA_IRQ_HARD_DIS when we
> hard-disable interrupts?

We enter guest with local_irq_disable() which means soft disabled, when
do we hard-disable interrupts? If we follow host exception handlers model
they set PACA_IRQ_EE/DEC/DBELL but not PACA_IRQ_HARD_DIS. Can you give it
a try to see how KVM behaves with PACA_IRQ_HARD_DIS? I can't do it right no=
w.

>=20
> > > Where is the arch_local_irq_restore() instance you're talking about?
> >
> > ./arch/power/kernel/irq.c
>=20
> I meant the caller. :-P

./arch/powerpc/include/asm/hw_irq.h

  55static inline unsigned long arch_local_irq_disable(void)
  56{
  57        unsigned long flags, zero;
  58
  59        asm volatile(
  60                "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
  61                : "=3Dr" (flags), "=3D&r" (zero)
  62                : "i" (offsetof(struct paca_struct, soft_enabled))
  63                : "memory");
  64
  65        return flags;
  66}
  67
  68extern void arch_local_irq_restore(unsigned long);
  69
  70static inline void arch_local_irq_enable(void)
  71{
  72        arch_local_irq_restore(1);
  73}

-Mike

^ permalink raw reply

* Re: net/eth/ibmveth: Fixup retrieval of MAC address
From: Benjamin Herrenschmidt @ 2013-05-03 21:17 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, linuxppc-dev, David Miller, David Gibson
In-Reply-To: <1367598632.2756.3.camel@bwh-desktop.uk.solarflarecom.com>

On Fri, 2013-05-03 at 17:30 +0100, Ben Hutchings wrote:
> > +	/* Workaround for old/broken pHyp */
> > +	if (mac_len == 8)
> > +		mac_addr_p += 2;
> > +	if (mac_len != 6) {
> 
> Missing 'else' before the second if?

Absolutely... oops :-) I couldn't find a version of pHyp with the wrong
property to test with. I suppose I could hack it up in OFW before boot.

I'll fix that and respin, sorry about that.

Cheers,
Ben.

> > +		dev_err(&dev->dev, "VETH_MAC_ADDR attribute wrong len %d\n",
> > +			mac_len);
> > +		return -EINVAL;
> > +	}
> [...]
> 

^ permalink raw reply

* [PATCH 1/1] powerpc: Force 32 bit MSIs for devices that require it
From: Brian King @ 2013-05-03 21:30 UTC (permalink / raw)
  To: benh; +Cc: klebers, brking, linuxppc-dev


The following patch implements a new PAPR change which allows
the OS to force the use of 32 bit MSIs, regardless of what
the PCI capabilities indicate. This is required for some
devices that advertise support for 64 bit MSIs but don't
actually support them.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/pci-bridge.h |    2 ++
 arch/powerpc/platforms/pseries/msi.c  |   21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff -puN arch/powerpc/platforms/pseries/msi.c~powerpc_32_bit_msi_papr arch/powerpc/platforms/pseries/msi.c
--- linux/arch/powerpc/platforms/pseries/msi.c~powerpc_32_bit_msi_papr	2013-05-03 11:15:09.000000000 -0500
+++ linux-bjking1/arch/powerpc/platforms/pseries/msi.c	2013-05-03 12:33:11.000000000 -0500
@@ -24,6 +24,7 @@ static int query_token, change_token;
 #define RTAS_RESET_FN		2
 #define RTAS_CHANGE_MSI_FN	3
 #define RTAS_CHANGE_MSIX_FN	4
+#define RTAS_CHANGE_32MSI_FN	5
 
 static struct pci_dn *get_pdn(struct pci_dev *pdev)
 {
@@ -58,7 +59,8 @@ static int rtas_change_msi(struct pci_dn
 
 	seq_num = 1;
 	do {
-		if (func == RTAS_CHANGE_MSI_FN || func == RTAS_CHANGE_MSIX_FN)
+		if (func == RTAS_CHANGE_MSI_FN || func == RTAS_CHANGE_MSIX_FN ||
+		    func == RTAS_CHANGE_32MSI_FN)
 			rc = rtas_call(change_token, 6, 4, rtas_ret, addr,
 					BUID_HI(buid), BUID_LO(buid),
 					func, num_irqs, seq_num);
@@ -426,9 +428,12 @@ static int rtas_setup_msi_irqs(struct pc
 	 */
 again:
 	if (type == PCI_CAP_ID_MSI) {
-		rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec);
+		if (pdn->force_32bit_msi)
+			rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
+		else
+			rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec);
 
-		if (rc < 0) {
+		if (rc < 0 && !pdn->force_32bit_msi) {
 			pr_debug("rtas_msi: trying the old firmware call.\n");
 			rc = rtas_change_msi(pdn, RTAS_CHANGE_FN, nvec);
 		}
@@ -512,3 +517,13 @@ static int rtas_msi_init(void)
 	return 0;
 }
 arch_initcall(rtas_msi_init);
+
+static void quirk_radeon(struct pci_dev *dev)
+{
+	struct pci_dn *pdn = get_pdn(dev);
+
+	if (pdn)
+		pdn->force_32bit_msi = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon);
diff -puN arch/powerpc/include/asm/pci-bridge.h~powerpc_32_bit_msi_papr arch/powerpc/include/asm/pci-bridge.h
--- linux/arch/powerpc/include/asm/pci-bridge.h~powerpc_32_bit_msi_papr	2013-05-03 11:15:09.000000000 -0500
+++ linux-bjking1/arch/powerpc/include/asm/pci-bridge.h	2013-05-03 11:15:09.000000000 -0500
@@ -163,6 +163,8 @@ struct pci_dn {
 
 	int	pci_ext_config_space;	/* for pci devices */
 
+	int	force_32bit_msi:1;
+
 	struct	pci_dev *pcidev;	/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
 	struct eeh_dev *edev;		/* eeh device */
_

^ permalink raw reply

* Re: [PATCH -V7 09/10] powerpc: Optimize hugepage invalidate
From: Benjamin Herrenschmidt @ 2013-05-03 21:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, paulus, linuxppc-dev, David Gibson
In-Reply-To: <87fvy351gc.fsf@linux.vnet.ibm.com>

On Sat, 2013-05-04 at 00:35 +0530, Aneesh Kumar K.V wrote:
> 
> if the firmware doesn't support lockless TLBIE, we need to do locking
> at the guest side. pSeries_lpar_flush_hash_range does that.

We don't "need" to ... it's an optimization because by experience the FW
locking was horrible (and the HW locking is too).

Beware however that the hash routines can take a lock too on
"native" (instead of pHyp)...

Ben.

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Benjamin Herrenschmidt @ 2013-05-03 22:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Mihai Caraman, linuxppc-dev, kvm@vger.kernel.org VIRTUAL MA...,
	kvm-ppc
In-Reply-To: <981DB739-E50F-488F-B2D8-916FDC2E1749@suse.de>

On Fri, 2013-05-03 at 18:24 +0200, Alexander Graf wrote:
> > There is no reason to exit guest with soft_enabled == 1, a local_irq_enable()
> > call will do this for us so get rid of kvmppc_layz_ee() calls. With this fix
> > we eliminate irqs_disabled() warnings and some guest and host hangs revealed
> > under stress tests, but guests still exhibit some unresponsiveness.
> > 
> > The unresponsiveness has to do with the fact that arch_local_irq_restore()
> > does not guarantees to hard enable interrupts. To do so replace exception
> > function calls like timer_interrupt() with irq_happened flags. The
> > local_irq_enable() call takes care of replaying them and lets the interrupts
> > hard enabled.
> > 
> > Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> 
> Ben, could you please review?

That does look like the right thing to do indeed.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Scott Wood @ 2013-05-03 22:06 UTC (permalink / raw)
  To: Caraman Mihai Claudiu-B02008
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF3E9B37@039-SN2MPN1-013.039d.mgd.msft.net>

On 05/03/2013 03:56:47 PM, Caraman Mihai Claudiu-B02008 wrote:
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Friday, May 03, 2013 11:15 PM
> > To: Caraman Mihai Claudiu-B02008
> > Cc: Wood Scott-B07421; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> > linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and =20
> hangs
> >
> > > > > The unresponsiveness has to do with the fact that
> > > > > arch_local_irq_restore()
> > > > > does not guarantees to hard enable interrupts.
> > > >
> > > > Could you elaborate?  If the saved IRQ state was "enabled", why
> > > > wouldn't arch_local_irq_restore() hard-enable IRQs?  The last =20
> thing
> > > it
> > > > does is __hard_irq_enable().
> > >
> > > 	if (!irq_happened)
> > > 		return;
> >
> > OK, so the problem is that we're not setting PACA_IRQ_HARD_DIS when =20
> we
> > hard-disable interrupts?
>=20
> We enter guest with local_irq_disable() which means soft disabled,

Hmm... I don't see any obvious breakage from that, but it makes me =20
nervous.  I'd be more comfortable if we just hard-disabled interrupts =20
there.

> when do we hard-disable interrupts?

Interrupts will be hard-disabled when we take an exception to exit =20
guest state.

> If we follow host exception handlers model
> they set PACA_IRQ_EE/DEC/DBELL but not PACA_IRQ_HARD_DIS. Can you =20
> give it
> a try to see how KVM behaves with PACA_IRQ_HARD_DIS? I can't do it =20
> right now.

I replaced the two calls to kvmppc_lazy_ee_enable() with calls to =20
hard_irq_disable(), and it seems to be working fine.

> > > > Where is the arch_local_irq_restore() instance you're talking =20
> about?
> > >
> > > ./arch/power/kernel/irq.c
> >
> > I meant the caller. :-P
>=20
> ./arch/powerpc/include/asm/hw_irq.h
>=20
>   55static inline unsigned long arch_local_irq_disable(void)
>   56{
>   57        unsigned long flags, zero;
>   58
>   59        asm volatile(
>   60                "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
>   61                : "=3Dr" (flags), "=3D&r" (zero)
>   62                : "i" (offsetof(struct paca_struct, soft_enabled))
>   63                : "memory");
>   64
>   65        return flags;
>   66}
>   67
>   68extern void arch_local_irq_restore(unsigned long);
>   69
>   70static inline void arch_local_irq_enable(void)
>   71{
>   72        arch_local_irq_restore(1);
>   73}

Sigh.  I meant the real caller, who's calling local_irq_restore().

-Scott=

^ permalink raw reply

* [PATCHv5 0/2] Speed Cap fixes for ppc64
From: Kleber Sacilotto de Souza @ 2013-05-03 22:43 UTC (permalink / raw)
  To: linuxppc-dev, dri-devel, Benjamin Herrenschmidt, Bjorn Helgaas,
	David Airlie, Michael Ellerman
  Cc: Brian King, Alex Deucher, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Kleber Sacilotto de Souza

This v5 of the patch series is based on v4 sent by Lucas Kannebley Tavares
with a few changes:

  1. Fix a compilation warning on the code from the first patch, where it was
missing a declaration of struct pci_host_bridge, used on the definition of
the function pointer pcibios_root_bridge_prepare() in
arch/powerpc/include/asm/machdep.h.
  2. Incorporate some changes proposed by Tony Breeds in
pseries_root_bridge_prepare().

The following description of the changes was extrated from v4:

This patch series does:
  1. max_bus_speed is used to set the device to gen2 speeds
  2. on power there's no longer a conflict between the pseries call and other
architectures, because the overwrite is done via a ppc_md hook
  3. radeon is using bus->max_bus_speed instead of drm_pcie_get_speed_cap_mask
for gen2 capability detection

The first patch consists of some architecture changes, such as adding a hook on
powerpc for pci_root_bridge_prepare, so that pseries will initialize it to a
function, while all other architectures get a NULL pointer. So that whenever
pci_create_root_bus is called, we'll get max_bus_speed properly setup from
OpenFirmware.

The second patch consists of simple radeon changes not to call
drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines,
the max_bus_speed property will be properly set already.

Kleber Sacilotto de Souza (2):
  ppc64: perform proper max_bus_speed detection
  radeon: use max_bus_speed to activate gen2 speeds

 arch/powerpc/include/asm/machdep.h       |    3 ++
 arch/powerpc/kernel/pci-common.c         |    8 ++++
 arch/powerpc/platforms/pseries/pci.c     |   53 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h |    4 ++
 arch/powerpc/platforms/pseries/setup.c   |    2 +
 drivers/gpu/drm/radeon/evergreen.c       |   10 ++----
 drivers/gpu/drm/radeon/r600.c            |    9 +----
 drivers/gpu/drm/radeon/rv770.c           |    9 +----
 8 files changed, 77 insertions(+), 21 deletions(-)

^ permalink raw reply

* [PATCHv5 1/2] ppc64: perform proper max_bus_speed detection
From: Kleber Sacilotto de Souza @ 2013-05-03 22:43 UTC (permalink / raw)
  To: linuxppc-dev, dri-devel, Benjamin Herrenschmidt, Bjorn Helgaas,
	David Airlie, Michael Ellerman
  Cc: Brian King, Alex Deucher, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Kleber Sacilotto de Souza
In-Reply-To: <1367620993-27037-1-git-send-email-klebers@linux.vnet.ibm.com>

On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().

From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h       |    3 ++
 arch/powerpc/kernel/pci-common.c         |    8 ++++
 arch/powerpc/platforms/pseries/pci.c     |   53 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h |    4 ++
 arch/powerpc/platforms/pseries/setup.c   |    2 +
 5 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 3f3f691..92386fc 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -29,6 +29,7 @@ struct rtc_time;
 struct file;
 struct pci_controller;
 struct kimage;
+struct pci_host_bridge;
 
 struct machdep_calls {
 	char		*name;
@@ -108,6 +109,8 @@ struct machdep_calls {
 	void		(*pcibios_fixup)(void);
 	int		(*pci_probe_mode)(struct pci_bus *);
 	void		(*pci_irq_fixup)(struct pci_dev *dev);
+	int		(*pcibios_root_bridge_prepare)(struct pci_host_bridge
+				*bridge);
 
 	/* To setup PHBs when using automatic OF platform driver for PCI */
 	int		(*pci_setup_phb)(struct pci_controller *host);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f325dc9..d5811d8 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -845,6 +845,14 @@ int pci_proc_domain(struct pci_bus *bus)
 	return 1;
 }
 
+int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	if (ppc_md.pcibios_root_bridge_prepare)
+		return ppc_md.pcibios_root_bridge_prepare(bridge);
+
+	return 0;
+}
+
 /* This header fixup will do the resource fixup for all devices as they are
  * probed, but not for bridge ranges
  */
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 0b580f4..5f93856 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -108,3 +108,56 @@ static void fixup_winbond_82c105(struct pci_dev* dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
 			 fixup_winbond_82c105);
+
+int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	struct device_node *dn, *pdn;
+	struct pci_bus *bus;
+	const uint32_t *pcie_link_speed_stats;
+
+	bus = bridge->bus;
+
+	dn = pcibios_get_phb_of_node(bus);
+	if (!dn)
+		return 0;
+
+	for (pdn = dn; pdn != NULL; pdn = of_get_next_parent(pdn)) {
+		pcie_link_speed_stats = (const uint32_t *) of_get_property(pdn,
+			"ibm,pcie-link-speed-stats", NULL);
+		if (pcie_link_speed_stats)
+			break;
+	}
+
+	of_node_put(pdn);
+
+	if (!pcie_link_speed_stats) {
+		pr_err("no ibm,pcie-link-speed-stats property\n");
+		return 0;
+	}
+
+	switch (pcie_link_speed_stats[0]) {
+	case 0x01:
+		bus->max_bus_speed = PCIE_SPEED_2_5GT;
+		break;
+	case 0x02:
+		bus->max_bus_speed = PCIE_SPEED_5_0GT;
+		break;
+	default:
+		bus->max_bus_speed = PCI_SPEED_UNKNOWN;
+		break;
+	}
+
+	switch (pcie_link_speed_stats[1]) {
+	case 0x01:
+		bus->cur_bus_speed = PCIE_SPEED_2_5GT;
+		break;
+	case 0x02:
+		bus->cur_bus_speed = PCIE_SPEED_5_0GT;
+		break;
+	default:
+		bus->cur_bus_speed = PCI_SPEED_UNKNOWN;
+		break;
+	}
+
+	return 0;
+}
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 8af71e4..c2a3a25 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -63,4 +63,8 @@ extern int dlpar_detach_node(struct device_node *);
 /* Snooze Delay, pseries_idle */
 DECLARE_PER_CPU(long, smt_snooze_delay);
 
+/* PCI root bridge prepare function override for pseries */
+struct pci_host_bridge;
+int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index ac932a9..c11c823 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -466,6 +466,8 @@ static void __init pSeries_setup_arch(void)
 	else
 		ppc_md.enable_pmcs = power4_enable_pmcs;
 
+	ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
 		long rc;
 		if ((rc = pSeries_enable_reloc_on_exc()) != H_SUCCESS) {
-- 
1.7.1

^ permalink raw reply related

* [PATCHv5 2/2] radeon: use max_bus_speed to activate gen2 speeds
From: Kleber Sacilotto de Souza @ 2013-05-03 22:43 UTC (permalink / raw)
  To: linuxppc-dev, dri-devel, Benjamin Herrenschmidt, Bjorn Helgaas,
	David Airlie, Michael Ellerman
  Cc: Brian King, Alex Deucher, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Kleber Sacilotto de Souza
In-Reply-To: <1367620993-27037-1-git-send-email-klebers@linux.vnet.ibm.com>

radeon currently uses a drm function to get the speed capabilities for
the bus, drm_pcie_get_speed_cap_mask. However, this is a non-standard
method of performing this detection and this patch changes it to use
the max_bus_speed attribute.

From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
---
 drivers/gpu/drm/radeon/evergreen.c |   10 +++-------
 drivers/gpu/drm/radeon/r600.c      |    9 ++-------
 drivers/gpu/drm/radeon/rv770.c     |    9 ++-------
 3 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 105bafb..3966696 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -4992,8 +4992,7 @@ void evergreen_fini(struct radeon_device *rdev)
 
 void evergreen_pcie_gen2_enable(struct radeon_device *rdev)
 {
-	u32 link_width_cntl, speed_cntl, mask;
-	int ret;
+	u32 link_width_cntl, speed_cntl;
 
 	if (radeon_pcie_gen2 == 0)
 		return;
@@ -5008,11 +5007,8 @@ void evergreen_pcie_gen2_enable(struct radeon_device *rdev)
 	if (ASIC_IS_X2(rdev))
 		return;
 
-	ret = drm_pcie_get_speed_cap_mask(rdev->ddev, &mask);
-	if (ret != 0)
-		return;
-
-	if (!(mask & DRM_PCIE_SPEED_50))
+	if ((rdev->pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT) &&
+		(rdev->pdev->bus->max_bus_speed != PCIE_SPEED_8_0GT))
 		return;
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 1a08008..b45e648 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -4631,8 +4631,6 @@ static void r600_pcie_gen2_enable(struct radeon_device *rdev)
 {
 	u32 link_width_cntl, lanes, speed_cntl, training_cntl, tmp;
 	u16 link_cntl2;
-	u32 mask;
-	int ret;
 
 	if (radeon_pcie_gen2 == 0)
 		return;
@@ -4651,11 +4649,8 @@ static void r600_pcie_gen2_enable(struct radeon_device *rdev)
 	if (rdev->family <= CHIP_R600)
 		return;
 
-	ret = drm_pcie_get_speed_cap_mask(rdev->ddev, &mask);
-	if (ret != 0)
-		return;
-
-	if (!(mask & DRM_PCIE_SPEED_50))
+	if ((rdev->pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT) &&
+		(rdev->pdev->bus->max_bus_speed != PCIE_SPEED_8_0GT))
 		return;
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c
index 83f612a..a6af4aa 100644
--- a/drivers/gpu/drm/radeon/rv770.c
+++ b/drivers/gpu/drm/radeon/rv770.c
@@ -2113,8 +2113,6 @@ static void rv770_pcie_gen2_enable(struct radeon_device *rdev)
 {
 	u32 link_width_cntl, lanes, speed_cntl, tmp;
 	u16 link_cntl2;
-	u32 mask;
-	int ret;
 
 	if (radeon_pcie_gen2 == 0)
 		return;
@@ -2129,11 +2127,8 @@ static void rv770_pcie_gen2_enable(struct radeon_device *rdev)
 	if (ASIC_IS_X2(rdev))
 		return;
 
-	ret = drm_pcie_get_speed_cap_mask(rdev->ddev, &mask);
-	if (ret != 0)
-		return;
-
-	if (!(mask & DRM_PCIE_SPEED_50))
+	if ((rdev->pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT) &&
+		(rdev->pdev->bus->max_bus_speed != PCIE_SPEED_8_0GT))
 		return;
 
 	DRM_INFO("enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0\n");
-- 
1.7.1

^ permalink raw reply related

* RE: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Caraman Mihai Claudiu-B02008 @ 2013-05-03 22:59 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org
In-Reply-To: <1367618808.19391.11@snotra>

> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Saturday, May 04, 2013 1:07 AM
> To: Caraman Mihai Claudiu-B02008
> Cc: Wood Scott-B07421; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
>=20
> I replaced the two calls to kvmppc_lazy_ee_enable() with calls to
> hard_irq_disable(), and it seems to be working fine.

Please take a look on 'KVM: PPC64: booke: Hard disable interrupts when
entering guest' RFC thread and see if your solution addresses Ben's
comments.

>=20
> > > > > Where is the arch_local_irq_restore() instance you're talking
> > about?
> > > >
> > > > ./arch/power/kernel/irq.c
> > >
> > > I meant the caller. :-P
> >
> > ./arch/powerpc/include/asm/hw_irq.h
> >
> >   55static inline unsigned long arch_local_irq_disable(void)
> >   56{
> >   57        unsigned long flags, zero;
> >   58
> >   59        asm volatile(
> >   60                "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
> >   61                : "=3Dr" (flags), "=3D&r" (zero)
> >   62                : "i" (offsetof(struct paca_struct, soft_enabled))
> >   63                : "memory");
> >   64
> >   65        return flags;
> >   66}
> >   67
> >   68extern void arch_local_irq_restore(unsigned long);
> >   69
> >   70static inline void arch_local_irq_enable(void)
> >   71{
> >   72        arch_local_irq_restore(1);
> >   73}
>=20
> Sigh.  I meant the real caller, who's calling local_irq_restore().

I'm not sure what you mean, arch_local_irq_restore() is called indirectly
by local_irq_enable() in our case from handle_exit().

-Mike

^ permalink raw reply

* Re: [PATCHv5 0/2] Speed Cap fixes for ppc64
From: Benjamin Herrenschmidt @ 2013-05-03 23:01 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: David Airlie, dri-devel, Brian King, Jerome Glisse,
	Thadeu Lima de Souza Cascardo, Bjorn Helgaas, Alex Deucher,
	linuxppc-dev
In-Reply-To: <1367620993-27037-1-git-send-email-klebers@linux.vnet.ibm.com>

On Fri, 2013-05-03 at 19:43 -0300, Kleber Sacilotto de Souza wrote:

> This patch series does:
>   1. max_bus_speed is used to set the device to gen2 speeds
>   2. on power there's no longer a conflict between the pseries call and other
> architectures, because the overwrite is done via a ppc_md hook
>   3. radeon is using bus->max_bus_speed instead of drm_pcie_get_speed_cap_mask
> for gen2 capability detection
> 
> The first patch consists of some architecture changes, such as adding a hook on
> powerpc for pci_root_bridge_prepare, so that pseries will initialize it to a
> function, while all other architectures get a NULL pointer. So that whenever
> pci_create_root_bus is called, we'll get max_bus_speed properly setup from
> OpenFirmware.
> 
> The second patch consists of simple radeon changes not to call
> drm_get_pcie_speed_cap_mask anymore. I assume that on x86 machines,
> the max_bus_speed property will be properly set already.

So I'm ok with the approach now and I might even put the powerpc patch
in for 3.10 since arguably we are fixing a nasty bug (uninitialized
max_bus_speed).

David, what's your feeling about the radeon change ? It would be nice if
that could go in soon for various distro targets :-) On the other hand
I'm not going to be pushy if you are not comfortable with it.

Cheers,
Ben.

^ permalink raw reply

* [PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in HWCAP2
From: Nishanth Aravamudan @ 2013-05-03 23:19 UTC (permalink / raw)
  To: benh
  Cc: Michael Neuling, Michael R Meissner, sjmunroe, bergner,
	Ryan Arnold, linuxppc-dev

Now that we have AT_HWCAP2 support, start exposing some of the new
POWER8 features via it.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---
Note: there are, I think, some Freescale processors that also should be
updated to indicate they support ISEL, but I don't know which ones.
Since this is a new feature bit (and vector), it seems like we can fix
that up in a follow-on patch. Also, this is my first patch trying to
manipulate these bits, so please let me know if I'm doing something
wrong (for instance, I don't see any particular order to the bits in
PPC_FEATURE_*)

diff --git a/arch/powerpc/include/uapi/asm/cputable.h b/arch/powerpc/include/uapi/asm/cputable.h
index ed9dd81..78db4e2 100644
--- a/arch/powerpc/include/uapi/asm/cputable.h
+++ b/arch/powerpc/include/uapi/asm/cputable.h
@@ -1,6 +1,7 @@
 #ifndef _UAPI__ASM_POWERPC_CPUTABLE_H
 #define _UAPI__ASM_POWERPC_CPUTABLE_H
 
+/* in AT_HWCAP */
 #define PPC_FEATURE_32			0x80000000
 #define PPC_FEATURE_64			0x40000000
 #define PPC_FEATURE_601_INSTR		0x20000000
@@ -33,4 +34,11 @@
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
 
+/* in AT_HWCAP2 */
+#define PPC_FEATURE2_ARCH_2_07		0x80000000
+#define PPC_FEATURE2_HTM		0x40000000
+#define PPC_FEATURE2_DSCR		0x20000000
+#define PPC_FEATURE2_EBB		0x10000000
+#define PPC_FEATURE2_ISEL		0x08000000
+
 #endif /* _UAPI__ASM_POWERPC_CPUTABLE_H */
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index ae9f433..871c741 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -102,6 +102,9 @@ extern void __restore_cpu_e6500(void);
 				 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
 				 PPC_FEATURE_TRUE_LE | \
 				 PPC_FEATURE_PSERIES_PERFMON_COMPAT)
+#define COMMON_USER2_POWER8	(PPC_FEATURE2_ARCH_2_07 | PPC_FEATURE2_HTM | \
+				 PPC_FEATURE2_DSCR | PPC_FEATURE2_EBB | \
+				 PPC_FEATURE2_ISEL)
 #define COMMON_USER_PA6T	(COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
 				 PPC_FEATURE_TRUE_LE | \
 				 PPC_FEATURE_HAS_ALTIVEC_COMP)
@@ -443,6 +446,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_name		= "POWER8 (architected)",
 		.cpu_features		= CPU_FTRS_POWER8,
 		.cpu_user_features	= COMMON_USER_POWER8,
+		.cpu_user_features2	= COMMON_USER2_POWER8,
 		.mmu_features		= MMU_FTRS_POWER8,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,
@@ -492,6 +496,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 		.cpu_name		= "POWER8 (raw)",
 		.cpu_features		= CPU_FTRS_POWER8,
 		.cpu_user_features	= COMMON_USER_POWER8,
+		.cpu_user_features2	= COMMON_USER2_POWER8,
 		.mmu_features		= MMU_FTRS_POWER8,
 		.icache_bsize		= 128,
 		.dcache_bsize		= 128,

^ permalink raw reply related

* Re: [PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in HWCAP2
From: Benjamin Herrenschmidt @ 2013-05-03 23:23 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Michael Neuling, Michael R Meissner, sjmunroe, bergner,
	Ryan Arnold, linuxppc-dev
In-Reply-To: <20130503231933.GA29436@linux.vnet.ibm.com>

On Fri, 2013-05-03 at 16:19 -0700, Nishanth Aravamudan wrote:
> +/* in AT_HWCAP2 */
> +#define PPC_FEATURE2_ARCH_2_07         0x80000000
> +#define PPC_FEATURE2_HTM               0x40000000
> +#define PPC_FEATURE2_DSCR              0x20000000
> +#define PPC_FEATURE2_EBB               0x10000000
> +#define PPC_FEATURE2_ISEL              0x08000000

Should we "adjust" (ie filter out) some of these based
on CONFIG_ options (such as transactional memory enabled,
EBB supported by the hypervisor, etc...) ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in HWCAP2
From: Michael R Meissner @ 2013-05-03 23:26 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: linuxppc-dev, Steve Munroe, Peter Bergner, Ryan Arnold,
	Michael Neuling
In-Reply-To: <20130503231933.GA29436@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 4193 bytes --]

According to the GCC sources, ISEL is enabled by default for the 8540, 
8548, e500mc, e500mc64, e6500 processors.



From:
Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To:
benh@kernel.crashing.org, 
Cc:
Steve Munroe/Rochester/IBM@IBMUS, Peter Bergner/Rochester/IBM@IBMUS, 
Michael R Meissner/Cambridge/IBM@IBMUS, Michael Neuling 
<michael.neuling@au1.ibm.com>, linuxppc-dev@lists.ozlabs.org, Ryan 
Arnold/Rochester/IBM@IBMUS
Date:
05/03/2013 07:19 PM
Subject:
[PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in 
HWCAP2



Now that we have AT_HWCAP2 support, start exposing some of the new
POWER8 features via it.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---
Note: there are, I think, some Freescale processors that also should be
updated to indicate they support ISEL, but I don't know which ones.
Since this is a new feature bit (and vector), it seems like we can fix
that up in a follow-on patch. Also, this is my first patch trying to
manipulate these bits, so please let me know if I'm doing something
wrong (for instance, I don't see any particular order to the bits in
PPC_FEATURE_*)

diff --git a/arch/powerpc/include/uapi/asm/cputable.h 
b/arch/powerpc/include/uapi/asm/cputable.h
index ed9dd81..78db4e2 100644
--- a/arch/powerpc/include/uapi/asm/cputable.h
+++ b/arch/powerpc/include/uapi/asm/cputable.h
@@ -1,6 +1,7 @@
 #ifndef _UAPI__ASM_POWERPC_CPUTABLE_H
 #define _UAPI__ASM_POWERPC_CPUTABLE_H
 
+/* in AT_HWCAP */
 #define PPC_FEATURE_32 0x80000000
 #define PPC_FEATURE_64 0x40000000
 #define PPC_FEATURE_601_INSTR                           0x20000000
@@ -33,4 +34,11 @@
 #define PPC_FEATURE_TRUE_LE                             0x00000002
 #define PPC_FEATURE_PPC_LE                              0x00000001
 
+/* in AT_HWCAP2 */
+#define PPC_FEATURE2_ARCH_2_07                          0x80000000
+#define PPC_FEATURE2_HTM                                0x40000000
+#define PPC_FEATURE2_DSCR                               0x20000000
+#define PPC_FEATURE2_EBB                                0x10000000
+#define PPC_FEATURE2_ISEL                               0x08000000
+
 #endif /* _UAPI__ASM_POWERPC_CPUTABLE_H */
diff --git a/arch/powerpc/kernel/cputable.c 
b/arch/powerpc/kernel/cputable.c
index ae9f433..871c741 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -102,6 +102,9 @@ extern void __restore_cpu_e6500(void);
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
 PPC_FEATURE_TRUE_LE | \
 PPC_FEATURE_PSERIES_PERFMON_COMPAT)
+#define COMMON_USER2_POWER8             (PPC_FEATURE2_ARCH_2_07 | 
PPC_FEATURE2_HTM | \
+ PPC_FEATURE2_DSCR | PPC_FEATURE2_EBB | \
+ PPC_FEATURE2_ISEL)
 #define COMMON_USER_PA6T                (COMMON_USER_PPC64 | 
PPC_FEATURE_PA6T |\
 PPC_FEATURE_TRUE_LE | \
 PPC_FEATURE_HAS_ALTIVEC_COMP)
@@ -443,6 +446,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
                                 .cpu_name                               = 
"POWER8 (architected)",
                                 .cpu_features                           = 
CPU_FTRS_POWER8,
                                 .cpu_user_features              = 
COMMON_USER_POWER8,
+                                .cpu_user_features2             = 
COMMON_USER2_POWER8,
                                 .mmu_features                           = 
MMU_FTRS_POWER8,
                                 .icache_bsize                           = 
128,
                                 .dcache_bsize                           = 
128,
@@ -492,6 +496,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
                                 .cpu_name                               = 
"POWER8 (raw)",
                                 .cpu_features                           = 
CPU_FTRS_POWER8,
                                 .cpu_user_features              = 
COMMON_USER_POWER8,
+                                .cpu_user_features2             = 
COMMON_USER2_POWER8,
                                 .mmu_features                           = 
MMU_FTRS_POWER8,
                                 .icache_bsize                           = 
128,
                                 .dcache_bsize                           = 
128,



[-- Attachment #2: Type: text/html, Size: 10342 bytes --]

^ permalink raw reply related

* Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and hangs
From: Scott Wood @ 2013-05-03 23:30 UTC (permalink / raw)
  To: Caraman Mihai Claudiu-B02008
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org
In-Reply-To: <300B73AA675FCE4A93EB4FC1D42459FF3E9D81@039-SN2MPN1-013.039d.mgd.msft.net>

On 05/03/2013 05:59:32 PM, Caraman Mihai Claudiu-B02008 wrote:
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Saturday, May 04, 2013 1:07 AM
> > To: Caraman Mihai Claudiu-B02008
> > Cc: Wood Scott-B07421; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> > linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH] KVM: PPC: Book3E 64: Fix IRQs warnings and =20
> hangs
> >
> > I replaced the two calls to kvmppc_lazy_ee_enable() with calls to
> > hard_irq_disable(), and it seems to be working fine.
>=20
> Please take a look on 'KVM: PPC64: booke: Hard disable interrupts when
> entering guest' RFC thread and see if your solution addresses Ben's
> comments.

My original one didn't (there was a race if an interrupt comes in =20
between soft-disabling and hard-disabling, it wouldn't be received =20
until the guest exits for some other reason).

Instead, I turned the local_irq_disable() into hard_irq_disable() plus =20
trace_hardirqs_off().  This worked without warnings.

-Scott=

^ permalink raw reply

* Re: [PATCH] arch/powerpc: advertise ISA2.07, HTM, DSCR, EBB and ISEL bits in HWCAP2
From: Nishanth Aravamudan @ 2013-05-03 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Neuling, Michael R Meissner, sjmunroe, bergner,
	Ryan Arnold, linuxppc-dev
In-Reply-To: <1367623431.4389.132.camel@pasglop>

On 04.05.2013 [09:23:51 +1000], Benjamin Herrenschmidt wrote:
> On Fri, 2013-05-03 at 16:19 -0700, Nishanth Aravamudan wrote:
> > +/* in AT_HWCAP2 */
> > +#define PPC_FEATURE2_ARCH_2_07         0x80000000
> > +#define PPC_FEATURE2_HTM               0x40000000
> > +#define PPC_FEATURE2_DSCR              0x20000000
> > +#define PPC_FEATURE2_EBB               0x10000000
> > +#define PPC_FEATURE2_ISEL              0x08000000
> 
> Should we "adjust" (ie filter out) some of these based
> on CONFIG_ options (such as transactional memory enabled,
> EBB supported by the hypervisor, etc...) ?

Err, yeah, that seems reasonable :) However, it seems like glibc uses
these values rather directly so it knows what bits to check for each
feature. Therefore, it seems like it would be better to do the
ifdeffery/checking in the user in cputable.c, but that seems like it
could get quite complicated.

Would it be ok (I guess I'm asking Ryan & co. here) to have an #ifdef in
the definition that may or may not mean the bit is set in the aux
vector, but the bit, if set, would always be the same bit?

-Nish

^ permalink raw reply

* [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Scott Wood @ 2013-05-03 23:45 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Scott Wood, Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

kvmppc_lazy_ee_enable() was causing interrupts to be soft-enabled
(albeit hard-disabled) in kvmppc_restart_interrupt().  This led to
warnings, and possibly breakage if the interrupt state was later saved
and then restored (leading to interrupts being hard-and-soft enabled
when they should be at least soft-disabled).

Simply removing kvmppc_lazy_ee_enable() leaves interrupts only
soft-disabled when we enter the guest, but they will be hard-disabled
when we exit the guest -- without PACA_IRQ_HARD_DIS ever being set, so
the local_irq_enable() fails to hard-enable.

While we could just set PACA_IRQ_HARD_DIS after an exit to compensate,
instead hard-disable interrupts before entering the guest.  This way,
we won't have to worry about interactions if we take an interrupt
during the guest entry code.  While I don't see any obvious
interactions, it could change in the future (e.g. it would be bad if
the non-hv code were used on 64-bit or if 32-bit guest lazy interrupt
disabling, since the non-hv code changes IVPR among other things).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/kvm/booke.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ecbe908..b216821 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -666,14 +666,14 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}

-	local_irq_disable();
+	hard_irq_disable();
+	trace_hardirqs_off();
 	s = kvmppc_prepare_to_enter(vcpu);
 	if (s <= 0) {
 		local_irq_enable();
 		ret = s;
 		goto out;
 	}
-	kvmppc_lazy_ee_enable();

 	kvm_guest_enter();

@@ -1150,13 +1150,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 * aren't already exiting to userspace for some other reason.
 	 */
 	if (!(r & RESUME_HOST)) {
-		local_irq_disable();
+		hard_irq_disable();
+		trace_hardirqs_off();
 		s = kvmppc_prepare_to_enter(vcpu);
 		if (s <= 0) {
 			local_irq_enable();
 			r = (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
-		} else {
-			kvmppc_lazy_ee_enable();
 		}
 	}

-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] kvm/ppc/booke64: Hard disable interrupts when entering the guest
From: Scott Wood @ 2013-05-03 23:53 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, Alexander Graf, kvm-ppc, kvm
In-Reply-To: <1367624723-22456-1-git-send-email-scottwood@freescale.com>

On 05/03/2013 06:45:23 PM, Scott Wood wrote:
> While we could just set PACA_IRQ_HARD_DIS after an exit to compensate,
> instead hard-disable interrupts before entering the guest.  This way,
> we won't have to worry about interactions if we take an interrupt
> during the guest entry code.  While I don't see any obvious
> interactions, it could change in the future (e.g. it would be bad if
> the non-hv code were used on 64-bit or if 32-bit guest lazy interrupt
> disabling, since the non-hv code changes IVPR among other things).

s/32-bit guest lazy/32-bit gets lazy/

-Scott=

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox