LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2] panic: Make panic_timeout configurable
From: Michael Ellerman @ 2013-11-22  1:54 UTC (permalink / raw)
  To: Jason Baron
  Cc: Felipe Contreras, linux-kernel@vger.kernel.org,
	ralf@linux-mips.org, paulus@samba.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org, Ingo Molnar
In-Reply-To: <528E795A.6070600@akamai.com>

On Thu, Nov 21, 2013 at 04:21:30PM -0500, Jason Baron wrote:
> On 11/21/2013 06:16 AM, Michael Ellerman wrote:
> > On Tue, Nov 19, 2013 at 05:04:14PM -0500, Jason Baron wrote:
> >> On 11/19/2013 02:09 AM, Ingo Molnar wrote:
> >>>
> >>> * Jason Baron <jbaron@akamai.com> wrote:
> >>>
> >>>> On 11/18/2013 05:30 PM, Andrew Morton wrote:
> >>>>> On Mon, 18 Nov 2013 21:04:36 +0000 (GMT) Jason Baron <jbaron@akamai.com> wrote:
> >>>>>
> >>>>>> The panic_timeout value can be set via the command line option 'panic=x', or via
> >>>>>> /proc/sys/kernel/panic, however that is not sufficient when the panic occurs
> >>>>>> before we are able to set up these values. Thus, add a CONFIG_PANIC_TIMEOUT
> >>>>>> so that we can set the desired value from the .config.
> >>>>>>
> >>>>>> The default panic_timeout value continues to be 0 - wait forever, 
> >>>>>> except for powerpc and mips, which have been defaulted to 180 and 
> >>>>>> 5 respectively. This is in keeping with the fact that these 
> >>>>>> arches already set panic_timeout in their arch init code. 
> >>>>>> However, I found three exceptions- two in mips and one in powerpc 
> >>>>>> where the settings didn't match these default values. In those 
> >>>>>> cases, I left the arch code so it continues to override, in case 
> >>>>>> the user has not changed from the default. It would nice if these 
> >>>>>> arches had one default value, or if we could determine the 
> >>>>>> correct setting at compile-time.
> > ...
> >>
> >> Sure, I can round up all the related patches in this area that make
> >> sense and re-submit as a series.
> >>
> >> Felipe, would the CONFIG_PANIC_TIMEOUT=xx .config parameter work for your
> >> needs, or would you still like to see the command-line processing moved
> >> up?
> >>
> >> I'd also like to hear from the PowerPC folks about the arch defaults
> >> there. Now, that mips is ok with CONFIG_PANIC_TIMEOUT, PowerPC is the
> >> only arch doing specific initialization of 'panic_timeout'.
> > 
> > Hi Jason,
> > 
> > I think we'd like to choose the value at runtime, as we do now. The
> > powerpc arch supports a wide spread of different hardware, so it's nice
> > to be able to customise the value based on the platform. Also we build a
> > single kernel that boots on many platforms, and so we can't pick the
> > value at compile time.
> 
> Hi,
> 
> Ok, So powerpc sets the timeout to '180' during setup_arch(), but then
> overrides the value to '10' only for pSeries. 

Yep.

> The patch proposed in this thread, sets the default built-in value for
> powerpc to 180 and then continues to override in pSeries, if its
> still 180 (IE the user hasn't requested another value). This allows
> the panic_timeout value to have an effect before we reach the
> arch_init() code. Are you ok with this? That is, are you ok with the
> proposed powerpc bits?

Yes that looks OK to me.

It means we get a non-zero value during early boot, which is nice, the
user can override the value if they wish, and otherwise the existing
behaviour is unchanged.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/signals: Mark VSX not saved with small contexts
From: Carlos O'Donell @ 2013-11-22  0:56 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Steve Best, linuxppc-dev, Haren Myneni
In-Reply-To: <528EAAEE.8070300@redhat.com>

On 11/21/2013 07:53 PM, Carlos O'Donell wrote:
> The addition of the *context functions in glibc for 64-bit power
> happened in 2003 by glibc commit 609b4783, with the mcontext_t
> being expanded to include 

... support for VMX state via `long vmx_reserve[NVRREG+NVRREG+1];'.

Cheers,
Carlos.

^ permalink raw reply

* Re: [PATCH] powerpc/signals: Mark VSX not saved with small contexts
From: Carlos O'Donell @ 2013-11-22  0:53 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Steve Best, linuxppc-dev, Haren Myneni
In-Reply-To: <8900.1385072483@ale.ozlabs.ibm.com>

On 11/21/2013 05:21 PM, Michael Neuling wrote:
>>> What about the 64-bit code? I don't know the code but it appears at a glance to
>>> have the same bug.
>>  
>> It doesn't happen with 64-bit code because there the context contains
>> a sigcontext which on ppc64 has vmx_reserve to store the entire VMX
>> state. Therefore 64-bit ppc always has space to store the VMX registers
>> in a userspace context switch. It is only the 32-bit ppc ABI that lacks
>> the space.
> 
> VMX?  I don't understand this at all.  We extended the ucontext to
> handle the extra VSX state, so older code may still be using the small
> ucontext and we already have a bunch of checks in the 64bit case for
> this.
> 
> I agree with Michael, we should add this to the 64 bit case.  If we
> can't put VSX state in, then clear MSR VSX.

Sorry, typo, VSX not VMX.

I had not gone through the historical implementation of the 64-bit
code, I assumed it started with a sufficiently sized context structure,
but on closer review it didn't.

The addition of the *context functions in glibc for 64-bit power
happened in 2003 by glibc commit 609b4783, with the mcontext_t
being expanded to include 

I see that the 64-bit userspace context was extended in 2008 by your 
kernel commit ce48b210.

Thus you're right the check is needed in the 64-bit case.

However, at present the issue doesn't seem to trigger in the
64-bit userspace. Which is odd now that I review the code and 
see that the 64-bit userspace context is smaller than the kernel
context (lacks the `+32' to the vmx_reserve space). It could just
be that the compiler finds no chance to use VSX and therefore
the existing test cases don't trigger the bug. I don't plan to
investigate this further given that we're going to fix the 64-bit case
also.
 
> So how about we make it that simple and put it independent of the other
> if statement?
> 
> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index 749778e..f4a7fd4 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -458,6 +458,14 @@ static int save_user_regs(struct pt_regs *regs, struct mcon
>                         return 1;
>                 msr |= MSR_VSX;
>         }
> +       /*
> +        * With a small context structure we can't hold the VSX registers,
> +        * hence clear the MSR value to indicate the state was not saved.
> +        */
> +       if (!ctx_has_vsx_region)
> +               msr &= ~MSR_VSX;
> +
> +
>  #endif /* CONFIG_VSX */
>  #ifdef CONFIG_SPE
>         /* save spe registers */
> 

Looks good to me, along with a similar fix for signal_64.c.

Cheers,
Carlos.

^ permalink raw reply

* [PATCH] powerpc/booke: Only check for hugetlb in flush if vma != NULL
From: Scott Wood @ 2013-11-22  0:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Scott Wood

And in flush_hugetlb_page(), don't check whether vma is NULL after
we've already dereferenced it.

This was found by Dan using static analysis as described here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-November/113161.html

We currently get away with this because the callers that currently pass
NULL for vma seem to be 32-bit-only (e.g. highmem, and
CONFIG_DEBUG_PGALLOC in pgtable_32.c)  Hugetlb is currently 64-bit only,
so we never saw a NULL vma here.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 arch/powerpc/mm/hugetlbpage-book3e.c | 3 +--
 arch/powerpc/mm/tlb_nohash.c         | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage-book3e.c b/arch/powerpc/mm/hugetlbpage-book3e.c
index 3bc7006..74551b5 100644
--- a/arch/powerpc/mm/hugetlbpage-book3e.c
+++ b/arch/powerpc/mm/hugetlbpage-book3e.c
@@ -117,6 +117,5 @@ void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
 	struct hstate *hstate = hstate_file(vma->vm_file);
 	unsigned long tsize = huge_page_shift(hstate) - 10;
 
-	__flush_tlb_page(vma ? vma->vm_mm : NULL, vmaddr, tsize, 0);
-
+	__flush_tlb_page(vma->vm_mm, vmaddr, tsize, 0);
 }
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 41cd68d..358d743 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -305,7 +305,7 @@ void __flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
 {
 #ifdef CONFIG_HUGETLB_PAGE
-	if (is_vm_hugetlb_page(vma))
+	if (vma && is_vm_hugetlb_page(vma))
 		flush_hugetlb_page(vma, vmaddr);
 #endif
 
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 2/2] powerpc: mm: change pgtable index size for 64K page
From: Benjamin Herrenschmidt @ 2013-11-21 23:46 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <20131121221150.GB26359@iris.ozlabs.ibm.com>

On Fri, 2013-11-22 at 09:11 +1100, Paul Mackerras wrote:
> On Thu, Nov 21, 2013 at 10:17:55AM +0800, Liu Ping Fan wrote:
> > For 64K page, we waste half of the pte_t page. With this patch, after
> > changing PGD_INDEX_SIZE from 12 to 11, PTE_INDEX_SIZE from 8 to 9,
> > we can improve the usage of pte_t page and shrink the continuous phys
> > size for pgd_t.

Also you did you miss that we use the second half to store the
per-subpage hash info when using 64k on top of HW 4k ?

Cheers,
Ben.

> > Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/pgtable-ppc64-64k.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> > index a56b82f..f6955ff 100644
> > --- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> > +++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> > @@ -4,10 +4,10 @@
> >  #include <asm-generic/pgtable-nopud.h>
> >  
> >  
> > -#define PTE_INDEX_SIZE  8
> > +#define PTE_INDEX_SIZE  9
> >  #define PMD_INDEX_SIZE  10
> >  #define PUD_INDEX_SIZE	0
> > -#define PGD_INDEX_SIZE  12
> > +#define PGD_INDEX_SIZE  11
> 
> Nack.  Those definitions are the way they are in order to have the PMD
> map 16MB, which is our large page size, so that transparent huge pages
> work.
> 
> Paul.

^ permalink raw reply

* Re: [PATCH] powerpc/signals: Mark VSX not saved with small contexts
From: Michael Neuling @ 2013-11-21 22:21 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Michael Ellerman, Steve Best, linuxppc-dev, Haren Myneni
In-Reply-To: <528E2EB6.2010003@redhat.com>

Carlos O'Donell <carlos@redhat.com> wrote:

> On 11/21/2013 06:33 AM, Michael Ellerman wrote:
> > On Wed, Nov 20, 2013 at 04:18:54PM +1100, Michael Neuling wrote:
> >> The VSX MSR bit in the user context indicates if the context contains VSX
> >> state.  Currently we set this when the process has touched VSX at any stage.
> >>
> >> Unfortunately, if the user has not provided enough space to save the VSX state,
> >> we can't save it but we currently still set the MSR VSX bit.
> >>
> >> This patch changes this to clear the MSR VSX bit when the user doesn't provide
> >> enough space.  This indicates that there is no valid VSX state in the user
> >> context.
> >>
> >> This is needed to support get/set/make/swapcontext for applications that use
> >> VSX but only provide a small context.  For example, getcontext in glibc
> >> provides a smaller context since the VSX registers don't need to be saved over
> >> the glibc function call.  But since the program calling getcontext may have
> >> used VSX, the kernel currently says the VSX state is valid when it's not.  If
> >> the returned context is then used in setcontext (ie. a small context without
> >> VSX but with MSR VSX set), the kernel will refuse the context.  This situation
> >> has been reported by the glibc community.
> > 
> > Broken since forever?
> 
> Yes, broken since forever. At least it was known in glibc 2.18 that this was
> broken, but without an active distribution using it the defect wasn't analyzed.
> 
> >> Tested-by: Haren Myneni <haren@linux.vnet.ibm.com>
> >> Signed-off-by: Michael Neuling <mikey@neuling.org>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  arch/powerpc/kernel/signal_32.c | 10 +++++++++-
> > 
> > What about the 64-bit code? I don't know the code but it appears at a glance to
> > have the same bug.
>  
> It doesn't happen with 64-bit code because there the context contains
> a sigcontext which on ppc64 has vmx_reserve to store the entire VMX
> state. Therefore 64-bit ppc always has space to store the VMX registers
> in a userspace context switch. It is only the 32-bit ppc ABI that lacks
> the space.

VMX?  I don't understand this at all.  We extended the ucontext to
handle the extra VSX state, so older code may still be using the small
ucontext and we already have a bunch of checks in the 64bit case for
this.

I agree with Michael, we should add this to the 64 bit case.  If we
can't put VSX state in, then clear MSR VSX.

>  
> >> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> >> index 749778e..1844298 100644
> >> --- a/arch/powerpc/kernel/signal_32.c
> >> +++ b/arch/powerpc/kernel/signal_32.c
> >> @@ -457,7 +457,15 @@ static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
> >>  		if (copy_vsx_to_user(&frame->mc_vsregs, current))
> >>  			return 1;
> >>  		msr |= MSR_VSX;
> >> -	}
> >> +	} else if (!ctx_has_vsx_region)
> >> +		/*
> >> +		 * With a small context structure we can't hold the VSX
> >> +		 * registers, hence clear the MSR value to indicate the state
> >> +		 * was not saved.
> >> +		 */
> >> +		msr &= ~MSR_VSX;
> > 
> > I think it'd be clearer if this was just the else case. The full context is:
> > 
> >     if (current->thread.used_vsr && ctx_has_vsx_region) {
> >             __giveup_vsx(current);
> >             if (copy_vsx_to_user(&frame->mc_vsregs, current))
> >                     return 1;
> >             msr |= MSR_VSX;
> > +   } else if (!ctx_has_vsx_region)
> > +           /*
> > +            * With a small context structure we can't hold the VSX
> > +            * registers, hence clear the MSR value to indicate the state
> > +            * was not saved.
> > +            */
> > +           msr &= ~MSR_VSX;
> > 
> > Which means if current->thread.user_vsr and ctx_has_vsx_region are both false
> > we potentially leave MSR_VSX set in msr. I think it should be the case that
> > MSR_VSX is only ever set if current->thread.used_vsr is true, so it should be
> > OK in pratice, but it seems unnecessarily fragile.
> 
> If current->thread.user_vsr and ctx_has_vsx_region are both false then
> !ctx_has_vsx_region is true and we clear MSR_VSX.
> 
> Perhaps you mean if current->thread.user_vsr is false, but ctx_has_vsx_region
> is true? 
> 
> Previously the else clause reset MSR_VSX if:
> 1. current->thread.used_vsr == 0 && ctx_has_vsx_region == 0
> 2. current->thread.used_vsr == 1 && ctx_has_vsx_region == 0,
> 
> Now it resets MSR_VSX additionally for:
> 3. current->thread.used_vsr == 0 && ctx_has_vsx_region == 1,
> 
> 3. is a valid state. The task has not touched the VSX state and the context
> is large enough to be saved into. This may be a future state for ppc32 if 
> we adjust the ABI to have a large enough context buffer. However at present 
> it's not a plausible state. It's also a "don't care" state since there is 
> nothing saved in the context, and if nothing was saved in the context then
> MSR_VSX is not set.

This makes my head hurt.

MSR VSX needs to be cleared always if there is no VSX region.  It's
independant of used_vsr, so let's make that clear in the code.

>  
> > The logic should be "if we write VSX we set MSR_VSX, else we clear MSR_VSX", ie:
> > 
> >     if (current->thread.used_vsr && ctx_has_vsx_region) {
> >             __giveup_vsx(current);
> >             if (copy_vsx_to_user(&frame->mc_vsregs, current))
> >                     return 1;
> >             msr |= MSR_VSX;
> >     } else
> >             msr &= ~MSR_VSX;
> 
> If anything I dislike this because it might mask a bug in earlier code that
> might erroneously set MSR_VSX even though current->thread.user_vsr is not
> true. If anything the extra state 3. covered here is a buggy state.
> 
> I agree that your suggestion is more robust though since the definition of
> robustness is to operate despite failures.

The basic idea of the patch is that if the user hasn't passed a VSX
region, then we clear MSR VSX to indicate there is no VSX data.
It's independant of used_vsr.

So how about we make it that simple and put it independent of the other
if statement?

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 749778e..f4a7fd4 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -458,6 +458,14 @@ static int save_user_regs(struct pt_regs *regs, struct mcon
                        return 1;
                msr |= MSR_VSX;
        }
+       /*
+        * With a small context structure we can't hold the VSX registers,
+        * hence clear the MSR value to indicate the state was not saved.
+        */
+       if (!ctx_has_vsx_region)
+               msr &= ~MSR_VSX;
+
+
 #endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
        /* save spe registers */

^ permalink raw reply related

* Re: [PATCH 2/2] powerpc: mm: change pgtable index size for 64K page
From: Paul Mackerras @ 2013-11-21 22:11 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: linuxppc-dev
In-Reply-To: <1385000275-5988-2-git-send-email-pingfank@linux.vnet.ibm.com>

On Thu, Nov 21, 2013 at 10:17:55AM +0800, Liu Ping Fan wrote:
> For 64K page, we waste half of the pte_t page. With this patch, after
> changing PGD_INDEX_SIZE from 12 to 11, PTE_INDEX_SIZE from 8 to 9,
> we can improve the usage of pte_t page and shrink the continuous phys
> size for pgd_t.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pgtable-ppc64-64k.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64-64k.h b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> index a56b82f..f6955ff 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64-64k.h
> @@ -4,10 +4,10 @@
>  #include <asm-generic/pgtable-nopud.h>
>  
>  
> -#define PTE_INDEX_SIZE  8
> +#define PTE_INDEX_SIZE  9
>  #define PMD_INDEX_SIZE  10
>  #define PUD_INDEX_SIZE	0
> -#define PGD_INDEX_SIZE  12
> +#define PGD_INDEX_SIZE  11

Nack.  Those definitions are the way they are in order to have the PMD
map 16MB, which is our large page size, so that transparent huge pages
work.

Paul.

^ permalink raw reply

* Re: [PATCH v2] panic: Make panic_timeout configurable
From: Jason Baron @ 2013-11-21 21:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Felipe Contreras, linux-kernel@vger.kernel.org,
	ralf@linux-mips.org, paulus@samba.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org, Ingo Molnar
In-Reply-To: <20131121111638.GA15913@concordia>

On 11/21/2013 06:16 AM, Michael Ellerman wrote:
> On Tue, Nov 19, 2013 at 05:04:14PM -0500, Jason Baron wrote:
>> On 11/19/2013 02:09 AM, Ingo Molnar wrote:
>>>
>>> * Jason Baron <jbaron@akamai.com> wrote:
>>>
>>>> On 11/18/2013 05:30 PM, Andrew Morton wrote:
>>>>> On Mon, 18 Nov 2013 21:04:36 +0000 (GMT) Jason Baron <jbaron@akamai.com> wrote:
>>>>>
>>>>>> The panic_timeout value can be set via the command line option 'panic=x', or via
>>>>>> /proc/sys/kernel/panic, however that is not sufficient when the panic occurs
>>>>>> before we are able to set up these values. Thus, add a CONFIG_PANIC_TIMEOUT
>>>>>> so that we can set the desired value from the .config.
>>>>>>
>>>>>> The default panic_timeout value continues to be 0 - wait forever, 
>>>>>> except for powerpc and mips, which have been defaulted to 180 and 
>>>>>> 5 respectively. This is in keeping with the fact that these 
>>>>>> arches already set panic_timeout in their arch init code. 
>>>>>> However, I found three exceptions- two in mips and one in powerpc 
>>>>>> where the settings didn't match these default values. In those 
>>>>>> cases, I left the arch code so it continues to override, in case 
>>>>>> the user has not changed from the default. It would nice if these 
>>>>>> arches had one default value, or if we could determine the 
>>>>>> correct setting at compile-time.
> ...
>>
>> Sure, I can round up all the related patches in this area that make
>> sense and re-submit as a series.
>>
>> Felipe, would the CONFIG_PANIC_TIMEOUT=xx .config parameter work for your
>> needs, or would you still like to see the command-line processing moved
>> up?
>>
>> I'd also like to hear from the PowerPC folks about the arch defaults
>> there. Now, that mips is ok with CONFIG_PANIC_TIMEOUT, PowerPC is the
>> only arch doing specific initialization of 'panic_timeout'.
> 
> Hi Jason,
> 
> I think we'd like to choose the value at runtime, as we do now. The
> powerpc arch supports a wide spread of different hardware, so it's nice
> to be able to customise the value based on the platform. Also we build a
> single kernel that boots on many platforms, and so we can't pick the
> value at compile time.
> 
> cheers
> 

Hi,

Ok, So powerpc sets the timeout to '180' during setup_arch(), but then
overrides the value to '10' only for pSeries. The patch proposed in this
thread, sets the default built-in value for powerpc to 180 and then
continues to override in pSeries, if its still 180 (IE the user hasn't
requested another value). This allows the panic_timeout value to have
an effect before we reach the arch_init() code. Are you ok with this?
That is, are you ok with the proposed powerpc bits?

Or should we drop the powerpc default 180 Kconfig setting, such that
it continues to be 0 until the arch code is run? And in that case
only apply the arch defaults if still 0.

Thanks,

-Jason 

^ permalink raw reply

* Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
From: Alex Williamson @ 2013-11-21 21:00 UTC (permalink / raw)
  To: Scott Wood
  Cc: linux-pci@vger.kernel.org, agraf@suse.de, Stuart Yoder,
	Bharat Bhushan, iommu@lists.linux-foundation.org,
	bhelgaas@google.com, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1385066835.1403.489.camel@snotra.buserror.net>

On Thu, 2013-11-21 at 14:47 -0600, Scott Wood wrote:
> On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote:
> > On Thu, 2013-11-21 at 11:20 +0000, Bharat Bhushan wrote:
> > > 
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Thursday, November 21, 2013 12:17 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: joro@8bytes.org; bhelgaas@google.com; agraf@suse.de; Wood Scott-B07421;
> > > > Yoder Stuart-B08248; iommu@lists.linux-foundation.org; linux-
> > > > pci@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > > > kernel@vger.kernel.org; Bhushan Bharat-R65777
> > > > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
> > > > 
> > > > Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each vfio user has
> > > > $COUNT regions at their disposal exclusively)?
> > > 
> > > Number of msi-bank count is system wide and not per aperture, But will be setting windows for banks in the device aperture.
> > > So say if we are direct assigning 2 pci device (both have different iommu group, so 2 aperture in iommu) to VM.
> > > Now qemu can make only one call to know how many msi-banks are there but it must set sub-windows for all banks for both pci device in its respective aperture.
> > 
> > I'm still confused.  What I want to make sure of is that the banks are
> > independent per aperture.  For instance, if we have two separate
> > userspace processes operating independently and they both chose to use
> > msi bank zero for their device, that's bank zero within each aperture
> > and doesn't interfere.  Or another way to ask is can a malicious user
> > interfere with other users by using the wrong bank.  Thanks,
> 
> They can interfere.  With this hardware, the only way to prevent that is
> to make sure that a bank is not shared by multiple protection contexts.
> For some of our users, though, I believe preventing this is less
> important than the performance benefit.

I think we need some sort of ownership model around the msi banks then.
Otherwise there's nothing preventing another userspace from attempting
an MSI based attack on other users, or perhaps even on the host.  VFIO
can't allow that.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
From: Scott Wood @ 2013-11-21 20:47 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-pci@vger.kernel.org, agraf@suse.de, Stuart Yoder,
	Bharat Bhushan, iommu@lists.linux-foundation.org,
	bhelgaas@google.com, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1385066603.2879.414.camel@ul30vt.home>

On Thu, 2013-11-21 at 13:43 -0700, Alex Williamson wrote:
> On Thu, 2013-11-21 at 11:20 +0000, Bharat Bhushan wrote:
> > 
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Thursday, November 21, 2013 12:17 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: joro@8bytes.org; bhelgaas@google.com; agraf@suse.de; Wood Scott-B07421;
> > > Yoder Stuart-B08248; iommu@lists.linux-foundation.org; linux-
> > > pci@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > > kernel@vger.kernel.org; Bhushan Bharat-R65777
> > > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
> > > 
> > > Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each vfio user has
> > > $COUNT regions at their disposal exclusively)?
> > 
> > Number of msi-bank count is system wide and not per aperture, But will be setting windows for banks in the device aperture.
> > So say if we are direct assigning 2 pci device (both have different iommu group, so 2 aperture in iommu) to VM.
> > Now qemu can make only one call to know how many msi-banks are there but it must set sub-windows for all banks for both pci device in its respective aperture.
> 
> I'm still confused.  What I want to make sure of is that the banks are
> independent per aperture.  For instance, if we have two separate
> userspace processes operating independently and they both chose to use
> msi bank zero for their device, that's bank zero within each aperture
> and doesn't interfere.  Or another way to ask is can a malicious user
> interfere with other users by using the wrong bank.  Thanks,

They can interfere.  With this hardware, the only way to prevent that is
to make sure that a bank is not shared by multiple protection contexts.
For some of our users, though, I believe preventing this is less
important than the performance benefit.

-Scott

^ permalink raw reply

* Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
From: Alex Williamson @ 2013-11-21 20:43 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: linux-pci@vger.kernel.org, joro@8bytes.org, agraf@suse.de,
	Stuart Yoder, Scott Wood, iommu@lists.linux-foundation.org,
	bhelgaas@google.com, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D0721D9AC@039-SN2MPN1-012.039d.mgd.msft.net>

On Thu, 2013-11-21 at 11:20 +0000, Bharat Bhushan wrote:
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Thursday, November 21, 2013 12:17 AM
> > To: Bhushan Bharat-R65777
> > Cc: joro@8bytes.org; bhelgaas@google.com; agraf@suse.de; Wood Scott-B07421;
> > Yoder Stuart-B08248; iommu@lists.linux-foundation.org; linux-
> > pci@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> > kernel@vger.kernel.org; Bhushan Bharat-R65777
> > Subject: Re: [PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)
> > 
> > On Tue, 2013-11-19 at 10:47 +0530, Bharat Bhushan wrote:
> > > From: Bharat Bhushan <bharat.bhushan@freescale.com>
> > >
> > > PAMU (FSL IOMMU) has a concept of primary window and subwindows.
> > > Primary window corresponds to the complete guest iova address space
> > > (including MSI space), with respect to IOMMU_API this is termed as
> > > geometry. IOVA Base of subwindow is determined from the number of
> > > subwindows (configurable using iommu API).
> > > MSI I/O page must be within the geometry and maximum supported
> > > subwindows, so MSI IO-page is setup just after guest memory iova space.
> > >
> > > So patch 1/9-4/9(inclusive) are for defining the interface to get:
> > >   - Number of MSI regions (which is number of MSI banks for powerpc)
> > >   - MSI-region address range: Physical page which have the
> > >     address/addresses used for generating MSI interrupt
> > >     and size of the page.
> > >
> > > Patch 5/9-7/9(inclusive) is defining the interface of setting up MSI
> > > iova-base for a msi region(bank) for a device. so that when
> > > msi-message will be composed then this configured iova will be used.
> > > Earlier we were using iommu interface for getting the configured iova
> > > which was not currect and Alex Williamson suggeested this type of interface.
> > >
> > > patch 8/9 moves some common functions in a separate file so that these
> > > can be used by FSL_PAMU implementation (next patch uses this).
> > > These will be used later for iommu-none implementation. I believe we
> > > can do more of this but will take step by step.
> > >
> > > Finally last patch actually adds the support for FSL-PAMU :)
> > 
> > Patches 1-3: msi_get_region needs to return an error an error (probably
> > -EINVAL) if called on a path where there's no backend implementation.
> > Otherwise the caller doesn't know that the data in the region pointer isn't
> > valid.
> 
> will correct.
> 
> > 
> > Patches 5&6: same as above for msi_set_iova, return an error if no backend
> > implementation.
> 
> Ok
> 
> > 
> > Patch 7: Why does fsl_msi_del_iova_device bother to return anything if it's
> > always zero?  Return -ENODEV when not found?
> 
> Will make -ENODEV.
> 
> > 
> > Patch 9:
> > 
> > vfio_handle_get_attr() passes random kernel data back to userspace in the event
> > of iommu_domain_get_attr() error.
> 
> Will correct.
> 
> > 
> > vfio_handle_set_attr(): I don't see any data validation happening, is
> > iommu_domain_set_attr() really that safe?
> 
> We do not need any data validation here and iommu driver does whatever needed.
> So yes,  iommu_domain_set_attr() is safe.
> 
> > 
> > For both of those, drop the pr_err on unknown attribute, it's sufficient to
> > return error.
> 
> ok
> 
> > 
> > Is VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT per aperture (ie. each vfio user has
> > $COUNT regions at their disposal exclusively)?
> 
> Number of msi-bank count is system wide and not per aperture, But will be setting windows for banks in the device aperture.
> So say if we are direct assigning 2 pci device (both have different iommu group, so 2 aperture in iommu) to VM.
> Now qemu can make only one call to know how many msi-banks are there but it must set sub-windows for all banks for both pci device in its respective aperture.

I'm still confused.  What I want to make sure of is that the banks are
independent per aperture.  For instance, if we have two separate
userspace processes operating independently and they both chose to use
msi bank zero for their device, that's bank zero within each aperture
and doesn't interfere.  Or another way to ask is can a malicious user
interfere with other users by using the wrong bank.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v3] KVM: PPC: vfio kvm device: support spapr tce
From: Alex Williamson @ 2013-11-21 20:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: kvm-ppc, linuxppc-dev, linux-kernel, kvm
In-Reply-To: <528D6065.3030400@ozlabs.ru>

On Thu, 2013-11-21 at 12:22 +1100, Alexey Kardashevskiy wrote:
> On 11/21/2013 07:57 AM, Alex Williamson wrote:
> > On Wed, 2013-11-20 at 16:18 +1100, Alexey Kardashevskiy wrote:
> >> In addition to the external VFIO user API, a VFIO KVM device
> >> has been introduced recently.
> >>
> >> sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap
> >> via hypercalls which take a logical bus id (LIOBN) as a target IOMMU
> >> identifier. LIOBNs are made up and linked to IOMMU groups by the user
> >> space. In order to accelerate IOMMU operations in the KVM, we need
> >> to tell KVM the information about LIOBN-to-group mapping.
> >>
> >> For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter
> >> is added. It accepts a pair of a VFIO group fd and LIOBN.
> >>
> >> This also adds a new kvm_vfio_find_group_by_liobn() function which
> >> receives kvm struct, LIOBN and a callback. As it increases the IOMMU
> >> group use counter, the KVMr is required to pass a callback which
> >> called when the VFIO group is about to be removed VFIO-KVM tracking so
> >> the KVM is able to call iommu_group_put() to release the IOMMU group.
> >>
> >> The KVM uses kvm_vfio_find_group_by_liobn() once per KVM run and caches
> >> the result in kvm_arch. iommu_group_put() for all groups will be called
> >> when KVM finishes (in the SPAPR TCE in KVM enablement patch).
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >> Changes:
> >> v3:
> >> * total rework
> >> * added a release callback into kvm_vfio_find_group_by_liobn so now
> >> the user of the API can get a notification if the group is about to
> >> disappear
> >> ---
> >>  Documentation/virtual/kvm/devices/vfio.txt |  19 ++++-
> >>  arch/powerpc/kvm/Kconfig                   |   1 +
> >>  arch/powerpc/kvm/Makefile                  |   3 +
> >>  include/linux/kvm_host.h                   |  18 +++++
> >>  include/uapi/linux/kvm.h                   |   7 ++
> >>  virt/kvm/vfio.c                            | 116 ++++++++++++++++++++++++++++-
> >>  6 files changed, 161 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> >> index ef51740..7ecb3b2 100644
> >> --- a/Documentation/virtual/kvm/devices/vfio.txt
> >> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> >> @@ -16,7 +16,22 @@ Groups:
> >>  
> >>  KVM_DEV_VFIO_GROUP attributes:
> >>    KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> >> +	kvm_device_attr.addr points to an int32_t file descriptor
> >> +	for the VFIO group.
> >> +
> >>    KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
> >> +	kvm_device_attr.addr points to an int32_t file descriptor
> >> +	for the VFIO group.
> >> +
> >> +  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO group
> >> +	kvm_device_attr.addr points to a struct:
> >> +		struct kvm_vfio_spapr_tce_liobn {
> >> +			__u32	argsz;
> >> +			__u32	fd;
> > 
> > fds are signed, __s32
> > 
> >> +			__u32	liobn;
> >> +		};
> >> +		where
> >> +		@argsz is a struct size;
> >> +		@fd is a file descriptor for a VFIO group;
> >> +		@liobn is a logical bus id to be associated with the group.
> >>  
> >> -For each, kvm_device_attr.addr points to an int32_t file descriptor
> >> -for the VFIO group.
> >> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> >> index 61b3535..d1b7f64 100644
> >> --- a/arch/powerpc/kvm/Kconfig
> >> +++ b/arch/powerpc/kvm/Kconfig
> >> @@ -60,6 +60,7 @@ config KVM_BOOK3S_64
> >>  	select KVM_BOOK3S_64_HANDLER
> >>  	select KVM
> >>  	select SPAPR_TCE_IOMMU
> >> +	select KVM_VFIO
> >>  	---help---
> >>  	  Support running unmodified book3s_64 and book3s_32 guest kernels
> >>  	  in virtual machines on book3s_64 host processors.
> >> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> >> index 6646c95..2438d2e 100644
> >> --- a/arch/powerpc/kvm/Makefile
> >> +++ b/arch/powerpc/kvm/Makefile
> >> @@ -87,6 +87,9 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
> >>  kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
> >>  	book3s_xics.o
> >>  
> >> +kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) += \
> >> +	$(KVM)/vfio.o \
> >> +
> >>  kvm-book3s_64-module-objs := \
> >>  	$(KVM)/kvm_main.o \
> >>  	$(KVM)/eventfd.o \
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index 88ff96a..1d2ad5e 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1112,5 +1112,23 @@ static inline bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
> >>  }
> >>  
> >>  #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
> >> +
> >> +typedef void (*kvm_vfio_release_group_callback)(struct kvm *kvm,
> >> +		unsigned long liobn);
> > 
> > liobn was said to be __u32 in kvm_vfio_spapr_tce_liobn above, here it's
> > unsigned long?
> 
> 
> PAPR spec says it is 32 bit and kvm_vfio_spapr_tce_liobn is a binary
> interface (ABI?) so I want it to be precise.
> 
> However kvmppc_get_gpr() (used to parse hypercall parameters such as liobn)
> return unsigned long. So inside the kernel I use unsigned long.
> 
> So what does make sense to change here?

Perhaps nothing if that's clear to those familiar with PAPR.  Thanks,

Alex


> > 
> >> +
> >> +#if defined(CONFIG_KVM_VFIO) && defined(CONFIG_SPAPR_TCE_IOMMU)
> >> +
> >> +extern struct iommu_group *kvm_vfio_find_group_by_liobn(struct kvm *kvm,
> >> +		unsigned long liobn, kvm_vfio_release_group_callback cb);
> >> +
> >> +#else
> >> +
> >> +static inline struct iommu_group *kvm_vfio_find_group_by_liobn(struct kvm *kvm,
> >> +		unsigned long liobn, ikvm_vfio_release_group_callback cb)
> >> +{
> >> +	return NULL;
> >> +}
> >> +#endif /* CONFIG_KVM_VFIO && CONFIG_SPAPR_TCE_IOMMU */
> >> +
> >>  #endif
> >>  
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 7c1a349..3d77dde 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -847,6 +847,13 @@ struct kvm_device_attr {
> >>  #define  KVM_DEV_VFIO_GROUP			1
> >>  #define   KVM_DEV_VFIO_GROUP_ADD			1
> >>  #define   KVM_DEV_VFIO_GROUP_DEL			2
> >> +#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN	3
> >> +
> >> +struct kvm_vfio_spapr_tce_liobn {
> >> +	__u32	argsz;
> >> +	__u32	fd;
> >> +	__u32	liobn;
> >> +};
> >>  
> >>  /*
> >>   * ioctls for VM fds
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index ca4260e..448910d 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -22,6 +22,12 @@
> >>  struct kvm_vfio_group {
> >>  	struct list_head node;
> >>  	struct vfio_group *vfio_group;
> >> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> >> +	struct {
> >> +		unsigned long liobn;
> >> +		kvm_vfio_release_group_callback cb;
> >> +	} spapr_tce;
> >> +#endif
> >>  };
> >>  
> >>  struct kvm_vfio {
> >> @@ -59,6 +65,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
> >>  	symbol_put(vfio_group_put_external_user);
> >>  }
> >>  
> >> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> >> +struct iommu_group *kvm_vfio_find_group_by_liobn(struct kvm *kvm,
> >> +		unsigned long liobn, kvm_vfio_release_group_callback cb)
> >> +{
> >> +	struct kvm_vfio_group *kvg;
> >> +	int group_id;
> >> +	struct iommu_group *grp;
> >> +	struct kvm_vfio *kv = NULL;
> >> +	struct kvm_device *tmp;
> >> +
> >> +	if (!cb)
> >> +		return NULL;
> > 
> > Is it worthwhile to use ERR_PTR(-EINVAL) here and the caller can use
> > IS_ERR()?
> > 
> >> +
> >> +	/* Find a VFIO KVM device */
> >> +	list_for_each_entry(tmp, &kvm->devices, vm_node) {
> >> +		if (tmp->ops != &kvm_vfio_ops)
> >> +			continue;
> >> +
> >> +		kv = tmp->private;
> >> +		break;
> >> +	}
> >> +
> >> +	if (!kv)
> >> +		return NULL;
> > 
> > ERR_PTR(-EFAULT)?  EIO?
> > 
> >> +
> >> +	/* Find a group */
> > 
> > Still ignoring kv->lock
> > 
> >> +	list_for_each_entry(kvg, &kv->group_list, node) {
> >> +		if (kvg->spapr_tce.liobn != liobn)
> >> +			continue;
> >> +
> >> +		if (kvg->spapr_tce.cb)
> >> +			return NULL;
> > 
> > ERR_PTR(-EBUSY)?
> > 
> >> +
> >> +		kvg->spapr_tce.cb = cb;
> >> +		group_id = vfio_external_user_iommu_id(kvg->vfio_group);
> >> +		grp = iommu_group_get_by_id(group_id);
> >> +
> >> +		return grp;
> >> +	}
> >> +
> >> +	return NULL;
> > 
> > ERR_PTR(-ENODEV)?
> > 
> >> +}
> >> +EXPORT_SYMBOL_GPL(kvm_vfio_find_group_by_liobn);
> >> +#endif
> >> +
> >>  /*
> >>   * Groups can use the same or different IOMMU domains.  If the same then
> >>   * adding a new group may change the coherency of groups we've previously
> >> @@ -170,6 +221,11 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>  				continue;
> >>  
> >>  			list_del(&kvg->node);
> >> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> >> +			if (kvg->spapr_tce.cb)
> >> +				kvg->spapr_tce.cb(dev->kvm,
> >> +						kvg->spapr_tce.liobn);
> >> +#endif
> >>  			kvm_vfio_group_put_external_user(kvg->vfio_group);
> >>  			kfree(kvg);
> >>  			ret = 0;
> >> @@ -183,6 +239,62 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>  		kvm_vfio_update_coherency(dev);
> >>  
> >>  		return ret;
> >> +
> >> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> >> +	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: {
> >> +		struct kvm_vfio_spapr_tce_liobn param;
> >> +		unsigned long minsz;
> >> +		struct kvm_vfio *kv = dev->private;
> >> +		struct vfio_group *vfio_group;
> >> +		struct kvm_vfio_group *kvg;
> >> +		struct fd f;
> >> +
> >> +		minsz = offsetofend(struct kvm_vfio_spapr_tce_liobn, liobn);
> >> +
> >> +		if (copy_from_user(&param, (void __user *)arg, minsz))
> >> +			return -EFAULT;
> >> +
> >> +		if (param.argsz < minsz)
> >> +			return -EINVAL;
> >> +
> >> +		if (copy_from_user(&param, (void __user *)arg, minsz))
> >> +			return -EFAULT;
> > 
> > copy_from_user twice?  Extra copy here?
> 
> 
> Oh. All I wanted was to read minsz first but I cut-n-pasted too blindely :)
> 
> > 
> >> +
> >> +		f = fdget(param.fd);
> >> +		if (!f.file)
> >> +			return -EBADF;
> >> +
> >> +		vfio_group = kvm_vfio_group_get_external_user(f.file);
> >> +		fdput(f);
> >> +
> >> +		if (IS_ERR(vfio_group))
> >> +			return PTR_ERR(vfio_group);
> >> +
> >> +		ret = -ENOENT;
> >> +
> >> +		mutex_lock(&kv->lock);
> >> +
> >> +		list_for_each_entry(kvg, &kv->group_list, node) {
> >> +			if (kvg->vfio_group != vfio_group)
> >> +				continue;
> >> +
> >> +			if (kvg->spapr_tce.liobn) {
> >> +				ret = -EBUSY;
> >> +				break;
> >> +			}
> > 
> > Is zero not an liobn that can be used by userspace?
> 
> Good point, thanks. I thought zero cannot be used but by spec -1 is reserved
> 
> 
> >  Is it intentional
> > that there's no way to unset or change the group/liobn mapping?  Thanks,
> 
> 
> I do not see why we would want to disable once enabled acceleration. Group
> removal should clear it though so we are good.
> 
> Thanks for the review and patience :)
> 
> 
> > 
> > Alex
> > 
> >> +
> >> +			kvg->spapr_tce.liobn = param.liobn;
> >> +			ret = 0;
> >> +			break;
> >> +		}
> >> +
> >> +		mutex_unlock(&kv->lock);
> >> +
> >> +		kvm_vfio_group_put_external_user(vfio_group);
> >> +
> >> +		return ret;
> >> +	}
> >> +#endif /* CONFIG_SPAPR_TCE_IOMMU */
> >>  	}
> >>  
> >>  	return -ENXIO;
> >> @@ -207,9 +319,11 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>  		switch (attr->attr) {
> >>  		case KVM_DEV_VFIO_GROUP_ADD:
> >>  		case KVM_DEV_VFIO_GROUP_DEL:
> >> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> >> +		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN:
> >> +#endif
> >>  			return 0;
> >>  		}
> >> -
> >>  		break;
> >>  	}
> >>  
> > 
> > 
> > 
> 
> 

^ permalink raw reply

* Re: [PATCH v2] offb: make the screen properties endian safe
From: Benjamin Herrenschmidt @ 2013-11-21 19:57 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: linuxppc-dev
In-Reply-To: <528E2A89.90601@fr.ibm.com>

On Thu, 2013-11-21 at 16:45 +0100, Cedric Le Goater wrote:
> >  - fbdev *generally* assume native endian framebuffer, but of course
> > under qemu today, the adapter will use a big endian frame buffer
> > aperture. You can compile in support for foreign endian but I don't know
> > how that actually works.
> 
> OK. I will see how I can extend the tests. But, are you suggesting I should
> be using the foreign endian framework for the frame buffer ? 

Well, if it works ... did you try 16 and 32bpp ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 24/34] PCI: use weak functions for MSI arch-specific functions
From: Ralf Baechle @ 2013-11-21 19:33 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Richard Zhu, linux-s390, linux-ia64, Tony Luck, Russell King,
	Jason Cooper, Fenghua Yu, linux-mips, x86, Heiko Carstens,
	Chris Metcalf, sparclinux, Ingo Molnar, Paul Mackerras,
	H. Peter Anvin, Martin Schwidefsky, linux390, Thomas Gleixner,
	linuxppc-dev, David S. Miller
In-Reply-To: <20131121200804.617d6ae6@skate>

On Thu, Nov 21, 2013 at 08:08:04PM +0100, Thomas Petazzoni wrote:

> I think this patch was mistakenly sent by Richard Zhu. It is already
> part of mainline since 3.12:

Explains the deja vue ...

  Ralf

^ permalink raw reply

* Re: [PATCH 24/34] PCI: use weak functions for MSI arch-specific functions
From: Thomas Petazzoni @ 2013-11-21 19:08 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Richard Zhu, linux-s390, linux-ia64, Tony Luck, Russell King,
	Jason Cooper, Fenghua Yu, linux-mips, x86, Heiko Carstens,
	Chris Metcalf, sparclinux, Ingo Molnar, Paul Mackerras,
	H. Peter Anvin, Martin Schwidefsky, linux390, Thomas Gleixner,
	linuxppc-dev, David S. Miller
In-Reply-To: <20131121173933.GT10382@linux-mips.org>

Dear Ralf Baechle,

On Thu, 21 Nov 2013 18:39:33 +0100, Ralf Baechle wrote:
> On Wed, Nov 20, 2013 at 10:50:43AM +0800, Richard Zhu wrote:
> 
> Looking good,
> 
> Acked-by: Ralf Baechle <ralf@linux-mips.org>

I think this patch was mistakenly sent by Richard Zhu. It is already
part of mainline since 3.12:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/msi.c?id=4287d824f265451cd10f6d20266b27a207a6cdd7.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: powerpc: Fix missing includes needed for chroma
From: Jimi Xenidis @ 2013-11-21 17:49 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Yoonho Park, Linux PPC dev

[-- Attachment #1: Type: text/plain, Size: 3119 bytes --]

Drop it now :)
However, it may be nice to keep the A2 stuff until BGQ EOLs
-jx



On Wed, Nov 20, 2013 at 8:40 PM, Michael Neuling <mikey@neuling.org> wrote:

> chroma_defconfig is horribly broken currently, so add a bunch of
> #includes to fix it.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> So when are we dropping arch/powerpc/platforms/wsp?
>
> diff --git a/arch/powerpc/platforms/wsp/chroma.c
> b/arch/powerpc/platforms/wsp/chroma.c
> index 8ef53bc..aaa46b3 100644
> --- a/arch/powerpc/platforms/wsp/chroma.c
> +++ b/arch/powerpc/platforms/wsp/chroma.c
> @@ -15,6 +15,7 @@
>  #include <linux/of.h>
>  #include <linux/smp.h>
>  #include <linux/time.h>
> +#include <linux/of_fdt.h>
>
>  #include <asm/machdep.h>
>  #include <asm/udbg.h>
> diff --git a/arch/powerpc/platforms/wsp/h8.c
> b/arch/powerpc/platforms/wsp/h8.c
> index d18e6cc..a3c87f3 100644
> --- a/arch/powerpc/platforms/wsp/h8.c
> +++ b/arch/powerpc/platforms/wsp/h8.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/of.h>
>  #include <linux/io.h>
> +#include <linux/of_address.h>
>
>  #include "wsp.h"
>
> diff --git a/arch/powerpc/platforms/wsp/ics.c
> b/arch/powerpc/platforms/wsp/ics.c
> index 2d3b1dd..3b782ce 100644
> --- a/arch/powerpc/platforms/wsp/ics.c
> +++ b/arch/powerpc/platforms/wsp/ics.c
> @@ -18,6 +18,8 @@
>  #include <linux/smp.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_address.h>
>
>  #include <asm/io.h>
>  #include <asm/irq.h>
> diff --git a/arch/powerpc/platforms/wsp/opb_pic.c
> b/arch/powerpc/platforms/wsp/opb_pic.c
> index cb565bf..3f67298 100644
> --- a/arch/powerpc/platforms/wsp/opb_pic.c
> +++ b/arch/powerpc/platforms/wsp/opb_pic.c
> @@ -15,6 +15,8 @@
>  #include <linux/of.h>
>  #include <linux/slab.h>
>  #include <linux/time.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
>
>  #include <asm/reg_a2.h>
>  #include <asm/irq.h>
> diff --git a/arch/powerpc/platforms/wsp/psr2.c
> b/arch/powerpc/platforms/wsp/psr2.c
> index 508ec82..a87b414 100644
> --- a/arch/powerpc/platforms/wsp/psr2.c
> +++ b/arch/powerpc/platforms/wsp/psr2.c
> @@ -15,6 +15,7 @@
>  #include <linux/of.h>
>  #include <linux/smp.h>
>  #include <linux/time.h>
> +#include <linux/of_fdt.h>
>
>  #include <asm/machdep.h>
>  #include <asm/udbg.h>
> diff --git a/arch/powerpc/platforms/wsp/scom_wsp.c
> b/arch/powerpc/platforms/wsp/scom_wsp.c
> index 8928507..6538b4d 100644
> --- a/arch/powerpc/platforms/wsp/scom_wsp.c
> +++ b/arch/powerpc/platforms/wsp/scom_wsp.c
> @@ -14,6 +14,7 @@
>  #include <linux/of.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> +#include <linux/of_address.h>
>
>  #include <asm/cputhreads.h>
>  #include <asm/reg_a2.h>
> diff --git a/arch/powerpc/platforms/wsp/wsp.c
> b/arch/powerpc/platforms/wsp/wsp.c
> index ddb6efe..58cd1f0 100644
> --- a/arch/powerpc/platforms/wsp/wsp.c
> +++ b/arch/powerpc/platforms/wsp/wsp.c
> @@ -13,6 +13,7 @@
>  #include <linux/smp.h>
>  #include <linux/delay.h>
>  #include <linux/time.h>
> +#include <linux/of_address.h>
>
>  #include <asm/scom.h>
>
>

[-- Attachment #2: Type: text/html, Size: 4027 bytes --]

^ permalink raw reply

* Re: [PATCH 24/34] PCI: use weak functions for MSI arch-specific functions
From: Ralf Baechle @ 2013-11-21 17:39 UTC (permalink / raw)
  To: Richard Zhu
  Cc: Thomas Petazzoni, linux-s390, linux-ia64, Tony Luck, Russell King,
	Jason Cooper, Fenghua Yu, linux-mips, x86, Heiko Carstens,
	Chris Metcalf, sparclinux, Ingo Molnar, Paul Mackerras,
	H. Peter Anvin, Martin Schwidefsky, linux390, Thomas Gleixner,
	linuxppc-dev, David S. Miller
In-Reply-To: <1384915853-31006-24-git-send-email-r65037@freescale.com>

On Wed, Nov 20, 2013 at 10:50:43AM +0800, Richard Zhu wrote:

Looking good,

Acked-by: Ralf Baechle <ralf@linux-mips.org>

Nevertheless I'd again like to express that I'm not that fond of of the
increasing number of weak functions in the kernel.  In the old days
things were such that when an a platform didn't provice a platform hook
or enable a default hook function, one would get a build error - an
unmistakable sign to the maintainer that something needs attention.
Weak functions mean default functions may result in subtly incorrect
operation.  Been there, got bitten.

  Ralf

^ permalink raw reply

* Re: [RFC][PATCH] powerpc/CoreNet64: compile with CONFIG_E{5,6}500_CPU well
From: Scott Wood @ 2013-11-21 17:42 UTC (permalink / raw)
  To: "“tiejun.chen”"; +Cc: linuxppc-dev
In-Reply-To: <528DCB61.8040807@windriver.com>

On Thu, 2013-11-21 at 16:59 +0800, "=E2=80=9Ctiejun.chen=E2=80=9D" wrote:
> On 11/21/2013 12:41 AM, Scott Wood wrote:
> > On Wed, 2013-11-20 at 16:35 +0800, Tiejun Chen wrote:
> >> +# Altivec and Spe options not allowed with e500mc64 in GCC.
> >> +ifeq ($(call cc-option-yn,-mcpu=3De500mc64),n)
> >>   obj-$(CONFIG_ALTIVEC)	+=3D xor_vmx.o
> >>   CFLAGS_xor_vmx.o +=3D -maltivec -mabi=3Daltivec
> >> +endif
> >
> > This does not seem like the right fix.  What if GCC supports both
> > -mcpu=3De500mc64 and -mcpu=3De6500, and we're using the latter?  Or f=
or that
>=20
> I can understand what you mean, but in current kernel, -mcpu=3De500mc64=
 should be=20
> excluded from -mcpu=3De6500,
>=20
> arch/powerpc/Makefile:
>=20
> E5500_CPU :=3D $(call cc-option,-mcpu=3De500mc64,-mcpu=3Dpowerpc64)
> CFLAGS-$(CONFIG_E5500_CPU) +=3D $(E5500_CPU)
> CFLAGS-$(CONFIG_E6500_CPU) +=3D $(call cc-option,-mcpu=3De6500,$(E5500_=
CPU))

But your patch doesn't test what options we're actually using.  You
tested what options the compiler supports.

> But unfortunately, another place also use the same option,
>=20
> lib/raid6/Makefile:
>=20
> raid6_pq-$(CONFIG_ALTIVEC) +=3D altivec1.o altivec2.o altivec4.o altive=
c8.o
> ...
> ifeq ($(CONFIG_ALTIVEC),y)
> altivec_flags :=3D -maltivec -mabi=3Daltivec
> endif
>=20
> Looks we have to do something in this common Makefile file as well, but=
 it may=20
> be a bit ugly if still judge some cpu-specific flags...
>=20
> So what about this version?
>=20
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 607acf5..872a85c 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -127,7 +127,12 @@ CFLAGS-$(CONFIG_POWER5_CPU) +=3D $(call cc-option,=
-mcpu=3Dpower5)
>   CFLAGS-$(CONFIG_POWER6_CPU) +=3D $(call cc-option,-mcpu=3Dpower6)
>   CFLAGS-$(CONFIG_POWER7_CPU) +=3D $(call cc-option,-mcpu=3Dpower7)
>=20
> +# Altivec and Spe options not allowed with e500mc64 in GCC.
> +ifeq ($(CONFIG_ALTIVEC),)
>   E5500_CPU :=3D $(call cc-option,-mcpu=3De500mc64,-mcpu=3Dpowerpc64)
> +else
> +E5500_CPU :=3D -mcpu=3Dpowerpc64
> +endif
>   CFLAGS-$(CONFIG_E5500_CPU) +=3D $(E5500_CPU)
>   CFLAGS-$(CONFIG_E6500_CPU) +=3D $(call cc-option,-mcpu=3De6500,$(E550=
0_CPU))

Reverse the if/else so it uses positive logic, and remove the irrelevant
SPE from the comment, but otherwise I guess it's OK.

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc/signals: Mark VSX not saved with small contexts
From: Carlos O'Donell @ 2013-11-21 16:03 UTC (permalink / raw)
  To: Michael Ellerman, Michael Neuling; +Cc: Steve Best, linuxppc-dev, Haren Myneni
In-Reply-To: <20131121113333.GB15913@concordia>

On 11/21/2013 06:33 AM, Michael Ellerman wrote:
> On Wed, Nov 20, 2013 at 04:18:54PM +1100, Michael Neuling wrote:
>> The VSX MSR bit in the user context indicates if the context contains VSX
>> state.  Currently we set this when the process has touched VSX at any stage.
>>
>> Unfortunately, if the user has not provided enough space to save the VSX state,
>> we can't save it but we currently still set the MSR VSX bit.
>>
>> This patch changes this to clear the MSR VSX bit when the user doesn't provide
>> enough space.  This indicates that there is no valid VSX state in the user
>> context.
>>
>> This is needed to support get/set/make/swapcontext for applications that use
>> VSX but only provide a small context.  For example, getcontext in glibc
>> provides a smaller context since the VSX registers don't need to be saved over
>> the glibc function call.  But since the program calling getcontext may have
>> used VSX, the kernel currently says the VSX state is valid when it's not.  If
>> the returned context is then used in setcontext (ie. a small context without
>> VSX but with MSR VSX set), the kernel will refuse the context.  This situation
>> has been reported by the glibc community.
> 
> Broken since forever?

Yes, broken since forever. At least it was known in glibc 2.18 that this was
broken, but without an active distribution using it the defect wasn't analyzed.

>> Tested-by: Haren Myneni <haren@linux.vnet.ibm.com>
>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>> Cc: stable@vger.kernel.org
>> ---
>>  arch/powerpc/kernel/signal_32.c | 10 +++++++++-
> 
> What about the 64-bit code? I don't know the code but it appears at a glance to
> have the same bug.
 
It doesn't happen with 64-bit code because there the context contains
a sigcontext which on ppc64 has vmx_reserve to store the entire VMX
state. Therefore 64-bit ppc always has space to store the VMX registers
in a userspace context switch. It is only the 32-bit ppc ABI that lacks
the space.
 
>> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
>> index 749778e..1844298 100644
>> --- a/arch/powerpc/kernel/signal_32.c
>> +++ b/arch/powerpc/kernel/signal_32.c
>> @@ -457,7 +457,15 @@ static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
>>  		if (copy_vsx_to_user(&frame->mc_vsregs, current))
>>  			return 1;
>>  		msr |= MSR_VSX;
>> -	}
>> +	} else if (!ctx_has_vsx_region)
>> +		/*
>> +		 * With a small context structure we can't hold the VSX
>> +		 * registers, hence clear the MSR value to indicate the state
>> +		 * was not saved.
>> +		 */
>> +		msr &= ~MSR_VSX;
> 
> I think it'd be clearer if this was just the else case. The full context is:
> 
>     if (current->thread.used_vsr && ctx_has_vsx_region) {
>             __giveup_vsx(current);
>             if (copy_vsx_to_user(&frame->mc_vsregs, current))
>                     return 1;
>             msr |= MSR_VSX;
> +   } else if (!ctx_has_vsx_region)
> +           /*
> +            * With a small context structure we can't hold the VSX
> +            * registers, hence clear the MSR value to indicate the state
> +            * was not saved.
> +            */
> +           msr &= ~MSR_VSX;
> 
> Which means if current->thread.user_vsr and ctx_has_vsx_region are both false
> we potentially leave MSR_VSX set in msr. I think it should be the case that
> MSR_VSX is only ever set if current->thread.used_vsr is true, so it should be
> OK in pratice, but it seems unnecessarily fragile.

If current->thread.user_vsr and ctx_has_vsx_region are both false then
!ctx_has_vsx_region is true and we clear MSR_VSX.

Perhaps you mean if current->thread.user_vsr is false, but ctx_has_vsx_region
is true? 

Previously the else clause reset MSR_VSX if:
1. current->thread.used_vsr == 0 && ctx_has_vsx_region == 0
2. current->thread.used_vsr == 1 && ctx_has_vsx_region == 0,

Now it resets MSR_VSX additionally for:
3. current->thread.used_vsr == 0 && ctx_has_vsx_region == 1,

3. is a valid state. The task has not touched the VSX state and the context
is large enough to be saved into. This may be a future state for ppc32 if 
we adjust the ABI to have a large enough context buffer. However at present 
it's not a plausible state. It's also a "don't care" state since there is 
nothing saved in the context, and if nothing was saved in the context then
MSR_VSX is not set.
 
> The logic should be "if we write VSX we set MSR_VSX, else we clear MSR_VSX", ie:
> 
>     if (current->thread.used_vsr && ctx_has_vsx_region) {
>             __giveup_vsx(current);
>             if (copy_vsx_to_user(&frame->mc_vsregs, current))
>                     return 1;
>             msr |= MSR_VSX;
>     } else
>             msr &= ~MSR_VSX;

If anything I dislike this because it might mask a bug in earlier code that
might erroneously set MSR_VSX even though current->thread.user_vsr is not
true. If anything the extra state 3. covered here is a buggy state.

I agree that your suggestion is more robust though since the definition of
robustness is to operate despite failures.

Cheers,
Carlos.

^ permalink raw reply

* Re: [PATCH v2] offb: make the screen properties endian safe
From: Cedric Le Goater @ 2013-11-21 15:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1384987801.26969.114.camel@pasglop>

Hi,

On 11/20/2013 11:50 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-10-31 at 10:36 +0100, Cédric Le Goater wrote:
>> The "screen" properties : depth, width, height, linebytes need
>> to be converted to the host endian order when read from the device
>> tree.
>>
>> Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
>> ---
> 
> Did you actually test that ? IE, using emulated VGA in qemu for
> example ?

Yes, that's how I did. I ran LE and BE guests with "qemu -vga std" and 
different depths.

> I'm asking because there are a few interesting nits here...
> 
>  - fbdev *generally* assume native endian framebuffer, but of course
> under qemu today, the adapter will use a big endian frame buffer
> aperture. You can compile in support for foreign endian but I don't know
> how that actually works.

OK. I will see how I can extend the tests. But, are you suggesting I should
be using the foreign endian framework for the frame buffer ? 
 
>  - The setcolreg fix ... the "value" is used 2 lines above your endian
> swap, is that correct ?

The variables used to calculate "value" are in host endian order. It should
be fine.

Thanks,

C.

 
> Cheers
> Ben.
> 
> 
>> Changes in v2:
>>
>>  - replaced be32_to_cpu() by be32_to_cpup() 
>>  - fixed setcolreg ops 
>>
>>  drivers/video/offb.c |   12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/video/offb.c b/drivers/video/offb.c
>> index 0c4f343..68e8415 100644
>> --- a/drivers/video/offb.c
>> +++ b/drivers/video/offb.c
>> @@ -120,7 +120,7 @@ static int offb_setcolreg(u_int regno, u_int red, u_int green, u_int blue,
>>  			mask <<= info->var.transp.offset;
>>  			value |= mask;
>>  		}
>> -		pal[regno] = value;
>> +		pal[regno] = cpu_to_be32(value);
>>  		return 0;
>>  	}
>>  
>> @@ -536,7 +536,7 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node)
>>  	unsigned int flags, rsize, addr_prop = 0;
>>  	unsigned long max_size = 0;
>>  	u64 rstart, address = OF_BAD_ADDR;
>> -	const u32 *pp, *addrp, *up;
>> +	const __be32 *pp, *addrp, *up;
>>  	u64 asize;
>>  	int foreign_endian = 0;
>>  
>> @@ -552,25 +552,25 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node)
>>  	if (pp == NULL)
>>  		pp = of_get_property(dp, "depth", &len);
>>  	if (pp && len == sizeof(u32))
>> -		depth = *pp;
>> +		depth = be32_to_cpup(pp);
>>  
>>  	pp = of_get_property(dp, "linux,bootx-width", &len);
>>  	if (pp == NULL)
>>  		pp = of_get_property(dp, "width", &len);
>>  	if (pp && len == sizeof(u32))
>> -		width = *pp;
>> +		width = be32_to_cpup(pp);
>>  
>>  	pp = of_get_property(dp, "linux,bootx-height", &len);
>>  	if (pp == NULL)
>>  		pp = of_get_property(dp, "height", &len);
>>  	if (pp && len == sizeof(u32))
>> -		height = *pp;
>> +		height = be32_to_cpup(pp);
>>  
>>  	pp = of_get_property(dp, "linux,bootx-linebytes", &len);
>>  	if (pp == NULL)
>>  		pp = of_get_property(dp, "linebytes", &len);
>>  	if (pp && len == sizeof(u32) && (*pp != 0xffffffffu))
>> -		pitch = *pp;
>> +		pitch = be32_to_cpup(pp);
>>  	else
>>  		pitch = width * ((depth + 7) / 8);
>>  
> 
> 

^ permalink raw reply

* Re: [RFC][PATCH] powerpc/CoreNet64: compile with CONFIG_E{5, 6}500_CPU well
From: "“tiejun.chen”" @ 2013-11-21  8:59 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <1384965671.1403.414.camel@snotra.buserror.net>

On 11/21/2013 12:41 AM, Scott Wood wrote:
> On Wed, 2013-11-20 at 16:35 +0800, Tiejun Chen wrote:
>> CONFIG_ALTIVEC is always enabled for CoreNet64.
>
> In the defconfig perhaps, but this isn't a generally true statement.
>

Yes, but I think we should avoid this probable scenario :)

>> And if we select CONFIG_E{5,6}500_CPU this may introduce -mcpu=e500mc64
>> into $CFLAGS. But Altivec and Spe options not allowed with
>> e500mc64, so :
>
> Sigh.
>
>>    CC      arch/powerpc/lib/xor_vmx.o
>> arch/powerpc/lib/xor_vmx.c:1:0: error: AltiVec not supported in this target
>> make[1]: *** [arch/powerpc/lib/xor_vmx.o] Error 1
>> make: *** [arch/powerpc/lib] Error 2
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> ---
>>   arch/powerpc/lib/Makefile |    3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
>> index 95a20e1..641a77d 100644
>> --- a/arch/powerpc/lib/Makefile
>> +++ b/arch/powerpc/lib/Makefile
>> @@ -40,5 +40,8 @@ obj-y			+= code-patching.o
>>   obj-y			+= feature-fixups.o
>>   obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
>>
>> +# Altivec and Spe options not allowed with e500mc64 in GCC.
>> +ifeq ($(call cc-option-yn,-mcpu=e500mc64),n)
>>   obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o
>>   CFLAGS_xor_vmx.o += -maltivec -mabi=altivec
>> +endif
>
> This does not seem like the right fix.  What if GCC supports both
> -mcpu=e500mc64 and -mcpu=e6500, and we're using the latter?  Or for that

I can understand what you mean, but in current kernel, -mcpu=e500mc64 should be 
excluded from -mcpu=e6500,

arch/powerpc/Makefile:

E5500_CPU := $(call cc-option,-mcpu=e500mc64,-mcpu=powerpc64)
CFLAGS-$(CONFIG_E5500_CPU) += $(E5500_CPU)
CFLAGS-$(CONFIG_E6500_CPU) += $(call cc-option,-mcpu=e6500,$(E5500_CPU))

But unfortunately, another place also use the same option,

lib/raid6/Makefile:

raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o
...
ifeq ($(CONFIG_ALTIVEC),y)
altivec_flags := -maltivec -mabi=altivec
endif

Looks we have to do something in this common Makefile file as well, but it may 
be a bit ugly if still judge some cpu-specific flags...

So what about this version?

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 607acf5..872a85c 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -127,7 +127,12 @@ CFLAGS-$(CONFIG_POWER5_CPU) += $(call cc-option,-mcpu=power5)
  CFLAGS-$(CONFIG_POWER6_CPU) += $(call cc-option,-mcpu=power6)
  CFLAGS-$(CONFIG_POWER7_CPU) += $(call cc-option,-mcpu=power7)

+# Altivec and Spe options not allowed with e500mc64 in GCC.
+ifeq ($(CONFIG_ALTIVEC),)
  E5500_CPU := $(call cc-option,-mcpu=e500mc64,-mcpu=powerpc64)
+else
+E5500_CPU := -mcpu=powerpc64
+endif
  CFLAGS-$(CONFIG_E5500_CPU) += $(E5500_CPU)
  CFLAGS-$(CONFIG_E6500_CPU) += $(call cc-option,-mcpu=e6500,$(E5500_CPU))

Tiejun

> matter, if we're using -mcpu=whatever-ibm-chip-has-this?
>
> Plus, wouldn't you need to do something to prevent code in that file
> from being called?
>
> -Scott
>
>
>
>

^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree (Linus' tree related - powerpc)
From: Michael Ellerman @ 2013-11-21 12:11 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Stephen Rothwell, linux-next, linuxppc-dev, linux-kernel
In-Reply-To: <20131121143334.44be4231@kryten>

On Thu, Nov 21, 2013 at 02:33:34PM +1100, Anton Blanchard wrote:
> 
> Hi,
> 
> > After merging the final tree, today's linux-next build (powerpc
> > allyesconfig) failed like this:
> > 
> > <stdin>:1:0: error: -mcall-aixdesc must be big endian
> 
> Urgh, allyesconfig is building an LE kernel. Ian: do we need to reverse
> the polarity of the option, and call it CONFIG_CPU_BIG_ENDIAN?

More or less. See what we did with PPC_WERROR, for the same reasons, so
that it was off for an allyesconfig.

    config PPC_DISABLE_WERROR
            bool "Don't build arch/powerpc code with -Werror"
            default n
    
    config PPC_WERROR
            bool
            depends on !PPC_DISABLE_WERROR
            default y

cheers

^ permalink raw reply

* Re: [PATCH 8/9] powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.
From: Michael Ellerman @ 2013-11-21 12:05 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: rusty, Ulrich.Weigand, paulus, alistair, linuxppc-dev
In-Reply-To: <1384946106-18200-9-git-send-email-anton@samba.org>

On Wed, Nov 20, 2013 at 10:15:05PM +1100, Anton Blanchard wrote:
> With the little endian support merged, we can add the
> CONFIG_CPU_LITTLE_ENDIAN kernel config option.

This appears to do nothing, but the Makefile bits were already added in
commit d72b080 "powerpc: Add ability to build little endian kernels".

cheers

^ permalink raw reply

* Re: [PATCH 4/9] powerpc: Set eflags correctly for ELF ABIv2 core dumps.
From: Michael Ellerman @ 2013-11-21 11:58 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: rusty, Ulrich.Weigand, paulus, alistair, linuxppc-dev
In-Reply-To: <1384946106-18200-5-git-send-email-anton@samba.org>

On Wed, Nov 20, 2013 at 10:15:01PM +1100, Anton Blanchard wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> 
> We leave it at zero (though it could be 1) for old tasks.

There's no issue with a v1 binary producing a v0 core dump?

I can't think of why there would be, if they are equivalent.

cheers

^ permalink raw reply

* Re: [PATCH 3/9] powerpc: Add TIF_ELF2ABI flag.
From: Michael Ellerman @ 2013-11-21 11:55 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: rusty, Ulrich.Weigand, paulus, alistair, linuxppc-dev
In-Reply-To: <1384946106-18200-4-git-send-email-anton@samba.org>

On Wed, Nov 20, 2013 at 10:15:00PM +1100, Anton Blanchard wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> 
> Little endian ppc64 is getting an exciting new ABI.  This is reflected
> by the bottom two bits of e_flags in the ELF header:
> 
> 	0 == legacy binaries (v1 ABI)
> 	1 == binaries using the old ABI (compiled with a new toolchain)
> 	2 == binaries using the new ABI.

Just to be ridiculously clear for stupid people like me, you refer here
to the "v1 ABI" and "the old ABI" - they are the same thing - right?

> diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
> index ba7b197..05a3030 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -107,6 +107,9 @@ static inline struct thread_info *current_thread_info(void)
>  #define TIF_EMULATE_STACK_STORE	16	/* Is an instruction emulation
>  						for stack store? */
>  #define TIF_MEMDIE		17	/* is terminating due to OOM killer */
> +#if defined(CONFIG_PPC64)
> +#define TIF_ELF2ABI		18	/* function descriptors must die! */
> +#endif

This is the first TIF flag we #ifdef for 32 vs 64, is that just because
we don't want to waste a flag on 32 bit?

cheers

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox