LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: PCIe Access - achieve bursts without DMA
From: David Laight @ 2014-01-30 14:19 UTC (permalink / raw)
  To: 'Moese, Michael', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>

>From Moese, Michael
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-=
cores
> located inside our FPGA. On x86-based systems I was able to achieve burst=
s for
> both read and write access. On PPC32, using an e500v2, I had no success a=
t all
> so far.

I'm not sure that you can.
I had to write a simple driver for the PCIe CSB bridge dma on a 83xx ppc.
I think that might be the one in the e500v2.

I don't know how fast 'normal' PCIe slaves are, but we were accessing
an Altera fpga and the latency is less than pedestrian.
I think an ISA bus can run faster!
With moderate length transfers, the throughput was more than adequate.

	David

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Daniel Lezcano @ 2014-01-30 13:44 UTC (permalink / raw)
  To: Nicolas Pitre, Preeti U Murthy
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra, linux-sh,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Olof Johansson,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401300021530.1652@knanqh.ubzr>

On 01/30/2014 06:28 AM, Nicolas Pitre wrote:
> On Thu, 30 Jan 2014, Preeti U Murthy wrote:
>
>> Hi Nicolas,
>>
>> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
>>> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
>>>
>>>> In order to integrate cpuidle with the scheduler, we must have a better
>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>> such interaction to arch code.
>>>>
>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>
>>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>
>>> As mentioned in my reply to Olof's comment on patch #5/6, here's a new
>>> version of this patch adding the safety local_irq_enable() to the core
>>> code.
>>>
>>> ----- >8
>>>
>>> From: Nicolas Pitre <nicolas.pitre@linaro.org>
>>> Subject: idle: move the cpuidle entry point to the generic idle loop
>>>
>>> In order to integrate cpuidle with the scheduler, we must have a better
>>> proximity in the core code with what cpuidle is doing and not delegate
>>> such interaction to arch code.
>>>
>>> Architectures implementing arch_cpu_idle() should simply enter
>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>
>>> In both cases i.e. whether it is a cpuidle driver or the default
>>> arch_cpu_idle(), the calling convention expects IRQs to be disabled
>>> on entry and enabled on exit. There is a warning in place already but
>>> let's add a forced IRQ enable here as well.  This will allow for
>>> removing the forced IRQ enable some implementations do locally and
>>
>> Why would this patch allow for removing the forced IRQ enable that are
>> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
>> the default arch_cpu_idle() to have re-enabled the interrupts after
>> exiting from the default idle state? Its supposed to only catch faulty
>> cpuidle drivers that haven't enabled IRQs on exit from idle state but
>> are expected to have done so, isn't it?
>
> Exact.  However x86 currently does this:
>
> 	if (cpuidle_idle_call())
> 	        x86_idle();
> 	else
> 	        local_irq_enable();
>
> So whenever cpuidle_idle_call() is successful then IRQs are
> unconditionally enabled whether or not the underlying cpuidle driver has
> properly done it or not.  And the reason is that some of the x86 cpuidle
> do fail to enable IRQs before returning.
>
> So the idea is to get rid of this unconditional IRQ enabling and let the
> core issue a warning instead (as well as enabling IRQs to allow the
> system to run).

But what I don't get with your comment is the local_irq_enable is done 
from the cpuidle common framework in 'cpuidle_enter_state' it is not 
done from the arch specific backend cpuidle driver.

So the code above could be:

	if (cpuidle_idle_call())
		x86_idle();

without the else section, this local_irq_enable is pointless. Or may be 
I missed something ?


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* Re: [PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler
From: Nicolas Pitre @ 2014-01-30 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <20140130092835.GL11314@laptop.programming.kicks-ass.net>

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

> On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:
> > As everyone should know by now, we want to integrate the cpuidle
> > governor with the scheduler for a more efficient idling of CPUs.
> > In order to help the transition, this small patch series moves the
> > existing interaction with cpuidle from architecture code to generic
> > core code.  The ARM, PPC, SH and X86 architectures are concerned.
> > No functional change should have occurred yet.
> > 
> > @peterz: Are you willing to pick up those patches?
> 
> Yeah.. no objections. Should I pick these up or will you be sending
> another round?

I think you could pick them now, taking care of picking up the amended 
#1/6.


Nicolas

^ permalink raw reply

* [PATCH] powerpc/eeh: drop taken reference to driver on eeh_rmv_device
From: Thadeu Lima de Souza Cascardo @ 2014-01-30 13:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: paulus, shangw, Thadeu Lima de Souza Cascardo

Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use
partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
which may grab a reference to a driver, but not release it.

That prevents a driver from being removed after it has gone through EEH
recovery.

This patch drops the reference in either exit path if it was taken.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7bb30dc..afe7337 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -364,7 +364,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		return NULL;
 	driver = eeh_pcid_get(dev);
 	if (driver && driver->err_handler)
-		return NULL;
+		goto out;
 
 	/* Remove it from PCI subsystem */
 	pr_debug("EEH: Removing %s without EEH sensitive driver\n",
@@ -377,6 +377,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	pci_stop_and_remove_bus_device(dev);
 	pci_unlock_rescan_remove();
 
+out:
+	if (driver)
+		eeh_pcid_put(dev);
 	return NULL;
 }
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
From: Greg KH @ 2014-01-30 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, stable, Li Zhong
In-Reply-To: <1391036256.8524.81.camel@pasglop>

On Thu, Jan 30, 2014 at 09:57:36AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2014-01-29 at 10:45 -0800, Greg KH wrote:
> > On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote:
> > > From: Li Zhong <zhong@linux.vnet.ibm.com>
> > > 
> > > It seems that forward declaration couldn't work well with typedef, use
> > > struct spinlock directly to avoiding following build errors:
> > > 
> > > In file included from include/linux/spinlock.h:81,
> > >                  from include/linux/seqlock.h:35,
> > >                  from include/linux/time.h:5,
> > >                  from include/uapi/linux/timex.h:56,
> > >                  from include/linux/timex.h:56,
> > >                  from include/linux/sched.h:17,
> > >                  from arch/powerpc/kernel/asm-offsets.c:17:
> > > include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
> > > /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
> > > 
> > > build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> > > for 3.13 stable series
> > 
> > I don't understand, why is this needed?  Is there a corrisponding patch
> > upstream that already does this?  What went wrong with a "normal"
> > backport of the patch to 3.13?
> 
> There's a corresponding patch in powerpc-next that I'm about to send to
> Linus today, but for the backport, the "fix" could be folded into the
> original offending patch.

Oh come on, you know better than to try to send me a patch that isn't in
Linus's tree already.  Crap, I can't take that at all.

Send me the git commit id when it is in Linus's tree, otherwise I'm not
taking it.

And no, don't "fold in" anything, that's not ok either.  I'll just go
drop this patch entirely from all of my -stable trees for now.  Feel
free to resend them when all of the needed stuff is upstream.

greg k-h

^ permalink raw reply

* PCIe Access - achieve bursts without DMA
From: Moese, Michael @ 2014-01-30 12:20 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

Hello PPC-developers,
I'm currently trying to benchmark access speeds to our PCIe-connected IP-co=
res
located inside our FPGA. On x86-based systems I was able to achieve bursts =
for
both read and write access. On PPC32, using an e500v2, I had no success at =
all=20
so far.=20
I tried using ioremap_wc(), like I did on x86, for writing, and it only res=
ults in my
writes just being single requests, one after another.
For reads, I noticed I could not ioremap_cache() on PPC, so I used simple i=
oremap()
here.=20
I used several ways to read from the device, from simple readl(),memcpy_fro=
m_io(),=20
memcpy()  to cacheable_memcpy() - with no improvements.  Even when just iss=
uing
a batch of prefetch()-calls for all the memory to read did not result in re=
ad bursts.

I only get really poor results, writing is possible with around 40 MiByte/s=
, whereas I =20
can read at about only 3 MiByte/s.
After hours of studying the reference manual from freescale, looking into o=
ther code
and searching the web, I'm close to resignation.

Maybe someone of you has some more directions for me, I'd appreciate every =
hint
that leads me to my problem's solution - maybe I just missed something or l=
ack=20
knowledge about this architecture in general.

Thanks for your reading.


Michael

^ permalink raw reply

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Alexander Graf @ 2014-01-30 10:04 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev@lists.ozlabs.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <20140130054913.GA10611@iris.ozlabs.ibm.com>



> Am 30.01.2014 um 06:49 schrieb Paul Mackerras <paulus@samba.org>:
>=20
>> On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
>> virtual time base register is a per vm register and need to saved
>> and restored on vm exit and entry. Writing to VTB is not allowed
>> in the privileged mode.
> ...
>=20
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +#define mfvtb()        ({unsigned long rval;                \
>> +            asm volatile("mfspr %0, %1" :            \
>> +                     "=3Dr" (rval) : "i" (SPRN_VTB)); rval;})
>=20
> The mfspr will be a no-op on anything before POWER8, meaning the
> result will be whatever value was in the destination GPR before the
> mfspr.  I suppose that may not matter if the result is only ever used
> when we're running on a POWER8 host, but I would feel more comfortable
> if we had explicit feature tests to make sure of that, rather than
> possibly doing computations with unpredictable values.
>=20
> With your patch, a guest on a POWER7 or a PPC970 could do a read from
> VTB and get garbage -- first, there is nothing to stop userspace from
> requesting POWER8 emulation on an older machine, and secondly, even if
> the virtual machine is a PPC970 (say) you don't implement
> unimplemented SPR semantics for VTB (no-op if PR=3D0, illegal
> instruction interrupt if PR=3D1).
>=20
> On the whole I think it is reasonable to reject an attempt to set the
> virtual PVR to a POWER8 PVR value if we are not running on a POWER8
> host, because emulating all the new POWER8 features in software
> (particularly transactional memory) would not be feasible.  Alex may
> disagree. :)

We don't have a good feature flag indicator that tells kvm what the guest cp=
u is capable of. So yes, I think it's reasonable to just not expose p8 regis=
ters on p8 for now.

In theory it's of course possible to emulate a lot of p8 features on pre-p8 h=
ardware, but I'm not sure it's worth the effort. If anyone wants to spend th=
e time to work on it I'd be happy to tale patches though ;)

Alex

>=20
> Paul.

^ permalink raw reply

* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Alexander Graf @ 2014-01-30 10:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev@lists.ozlabs.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <20140130060000.GB10611@iris.ozlabs.ibm.com>



> Am 30.01.2014 um 07:00 schrieb Paul Mackerras <paulus@samba.org>:
>=20
>> On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
>> We allow priv-mode update of this. The guest value is saved in fscr,
>> and the value actually used is saved in shadow_fscr. shadow_fscr
>> only contains values that are allowed by the host. On
>> facility unavailable interrupt, if the facility is allowed by fscr
>> but disabled in shadow_fscr we need to emulate the support. Currently
>> all but EBB is disabled. We still don't support performance monitoring
>> in PR guest.
>=20
> ...
>=20
>> +    /*
>> +     * Save the current fscr in shadow fscr
>> +     */
>> +    mfspr r3,SPRN_FSCR
>> +    PPC_STL r3, VCPU_SHADOW_FSCR(r7)
>=20
> I don't think you need to do this.  What could possibly have changed
> FSCR since we loaded it on the way into the guest?

The interrupt cause is part of fscr. But yes, we only meed to store that on a=
n fscr interrupt.

Do we use anything from fscr inside the kernel? Could we switch it lazily on=
 vcpu_load/put?

Alex

>=20
> Paul.

^ permalink raw reply

* Re: [PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler
From: Peter Zijlstra @ 2014-01-30  9:28 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <1391017513-12995-1-git-send-email-nicolas.pitre@linaro.org>

On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:
> As everyone should know by now, we want to integrate the cpuidle
> governor with the scheduler for a more efficient idling of CPUs.
> In order to help the transition, this small patch series moves the
> existing interaction with cpuidle from architecture code to generic
> core code.  The ARM, PPC, SH and X86 architectures are concerned.
> No functional change should have occurred yet.
> 
> @peterz: Are you willing to pick up those patches?

Yeah.. no objections. Should I pick these up or will you be sending
another round?

^ permalink raw reply

* Re: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()
From: Peter Zijlstra @ 2014-01-30  9:24 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel@lists.linaro.org, Russell King, Linux-sh list,
	linux-pm, Daniel Lezcano, Rafael J. Wysocki,
	linux-kernel@vger.kernel.org, Preeti U Murthy, Paul Mundt,
	Olof Johansson, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <alpine.LFD.2.11.1401291434120.1652@knanqh.ubzr>

On Wed, Jan 29, 2014 at 03:14:40PM -0500, Nicolas Pitre wrote:
> Looking into some cpuidle drivers for x86 I found at least one that 
> doesn't respect this convention.  Damn.

Which one? We should probably fix it :-)

^ permalink raw reply

* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Paul Mackerras @ 2014-01-30  6:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1390927455-3312-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
> We allow priv-mode update of this. The guest value is saved in fscr,
> and the value actually used is saved in shadow_fscr. shadow_fscr
> only contains values that are allowed by the host. On
> facility unavailable interrupt, if the facility is allowed by fscr
> but disabled in shadow_fscr we need to emulate the support. Currently
> all but EBB is disabled. We still don't support performance monitoring
> in PR guest.

...

> +	/*
> +	 * Save the current fscr in shadow fscr
> +	 */
> +	mfspr r3,SPRN_FSCR
> +	PPC_STL r3, VCPU_SHADOW_FSCR(r7)

I don't think you need to do this.  What could possibly have changed
FSCR since we loaded it on the way into the guest?

Paul.

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Preeti U Murthy @ 2014-01-30  5:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401300021530.1652@knanqh.ubzr>

Hi Nicolas,

On 01/30/2014 10:58 AM, Nicolas Pitre wrote:
> On Thu, 30 Jan 2014, Preeti U Murthy wrote:
> 
>> Hi Nicolas,
>>
>> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
>>> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
>>>
>>>> In order to integrate cpuidle with the scheduler, we must have a better
>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>> such interaction to arch code.
>>>>
>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>
>>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>
>>> As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
>>> version of this patch adding the safety local_irq_enable() to the core 
>>> code.
>>>
>>> ----- >8
>>>
>>> From: Nicolas Pitre <nicolas.pitre@linaro.org>
>>> Subject: idle: move the cpuidle entry point to the generic idle loop
>>>
>>> In order to integrate cpuidle with the scheduler, we must have a better
>>> proximity in the core code with what cpuidle is doing and not delegate
>>> such interaction to arch code.
>>>
>>> Architectures implementing arch_cpu_idle() should simply enter
>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>
>>> In both cases i.e. whether it is a cpuidle driver or the default
>>> arch_cpu_idle(), the calling convention expects IRQs to be disabled
>>> on entry and enabled on exit. There is a warning in place already but
>>> let's add a forced IRQ enable here as well.  This will allow for
>>> removing the forced IRQ enable some implementations do locally and 
>>
>> Why would this patch allow for removing the forced IRQ enable that are
>> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
>> the default arch_cpu_idle() to have re-enabled the interrupts after
>> exiting from the default idle state? Its supposed to only catch faulty
>> cpuidle drivers that haven't enabled IRQs on exit from idle state but
>> are expected to have done so, isn't it?
> 
> Exact.  However x86 currently does this:
> 
> 	if (cpuidle_idle_call())
> 	        x86_idle();
> 	else
> 	        local_irq_enable();
> 
> So whenever cpuidle_idle_call() is successful then IRQs are 
> unconditionally enabled whether or not the underlying cpuidle driver has 
> properly done it or not.  And the reason is that some of the x86 cpuidle 
> do fail to enable IRQs before returning.
> 
> So the idea is to get rid of this unconditional IRQ enabling and let the 
> core issue a warning instead (as well as enabling IRQs to allow the 
> system to run).

Oh ok, thank you for clarifying this:)

Regards
Preeti U Murthy
> 
> 
> Nicolas
> 

^ permalink raw reply

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Paul Mackerras @ 2014-01-30  5:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1390927455-3312-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
> virtual time base register is a per vm register and need to saved
> and restored on vm exit and entry. Writing to VTB is not allowed
> in the privileged mode.
...

> +#ifdef CONFIG_PPC_BOOK3S_64
> +#define mfvtb()		({unsigned long rval;				\
> +			asm volatile("mfspr %0, %1" :			\
> +				     "=r" (rval) : "i" (SPRN_VTB)); rval;})

The mfspr will be a no-op on anything before POWER8, meaning the
result will be whatever value was in the destination GPR before the
mfspr.  I suppose that may not matter if the result is only ever used
when we're running on a POWER8 host, but I would feel more comfortable
if we had explicit feature tests to make sure of that, rather than
possibly doing computations with unpredictable values.

With your patch, a guest on a POWER7 or a PPC970 could do a read from
VTB and get garbage -- first, there is nothing to stop userspace from
requesting POWER8 emulation on an older machine, and secondly, even if
the virtual machine is a PPC970 (say) you don't implement
unimplemented SPR semantics for VTB (no-op if PR=0, illegal
instruction interrupt if PR=1).

On the whole I think it is reasonable to reject an attempt to set the
virtual PVR to a POWER8 PVR value if we are not running on a POWER8
host, because emulating all the new POWER8 features in software
(particularly transactional memory) would not be feasible.  Alex may
disagree. :)

Paul.

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Nicolas Pitre @ 2014-01-30  5:28 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <52E9C946.50704@linux.vnet.ibm.com>

On Thu, 30 Jan 2014, Preeti U Murthy wrote:

> Hi Nicolas,
> 
> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
> > On Wed, 29 Jan 2014, Nicolas Pitre wrote:
> > 
> >> In order to integrate cpuidle with the scheduler, we must have a better
> >> proximity in the core code with what cpuidle is doing and not delegate
> >> such interaction to arch code.
> >>
> >> Architectures implementing arch_cpu_idle() should simply enter
> >> a cheap idle mode in the absence of a proper cpuidle driver.
> >>
> >> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> >> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
> > version of this patch adding the safety local_irq_enable() to the core 
> > code.
> > 
> > ----- >8
> > 
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Subject: idle: move the cpuidle entry point to the generic idle loop
> > 
> > In order to integrate cpuidle with the scheduler, we must have a better
> > proximity in the core code with what cpuidle is doing and not delegate
> > such interaction to arch code.
> > 
> > Architectures implementing arch_cpu_idle() should simply enter
> > a cheap idle mode in the absence of a proper cpuidle driver.
> > 
> > In both cases i.e. whether it is a cpuidle driver or the default
> > arch_cpu_idle(), the calling convention expects IRQs to be disabled
> > on entry and enabled on exit. There is a warning in place already but
> > let's add a forced IRQ enable here as well.  This will allow for
> > removing the forced IRQ enable some implementations do locally and 
> 
> Why would this patch allow for removing the forced IRQ enable that are
> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
> the default arch_cpu_idle() to have re-enabled the interrupts after
> exiting from the default idle state? Its supposed to only catch faulty
> cpuidle drivers that haven't enabled IRQs on exit from idle state but
> are expected to have done so, isn't it?

Exact.  However x86 currently does this:

	if (cpuidle_idle_call())
	        x86_idle();
	else
	        local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are 
unconditionally enabled whether or not the underlying cpuidle driver has 
properly done it or not.  And the reason is that some of the x86 cpuidle 
do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the 
core issue a warning instead (as well as enabling IRQs to allow the 
system to run).


Nicolas

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Preeti U Murthy @ 2014-01-30  3:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401291526320.1652@knanqh.ubzr>

Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
> 
>> In order to integrate cpuidle with the scheduler, we must have a better
>> proximity in the core code with what cpuidle is doing and not delegate
>> such interaction to arch code.
>>
>> Architectures implementing arch_cpu_idle() should simply enter
>> a cheap idle mode in the absence of a proper cpuidle driver.
>>
>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> 
> As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
> version of this patch adding the safety local_irq_enable() to the core 
> code.
> 
> ----- >8
> 
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Subject: idle: move the cpuidle entry point to the generic idle loop
> 
> In order to integrate cpuidle with the scheduler, we must have a better
> proximity in the core code with what cpuidle is doing and not delegate
> such interaction to arch code.
> 
> Architectures implementing arch_cpu_idle() should simply enter
> a cheap idle mode in the absence of a proper cpuidle driver.
> 
> In both cases i.e. whether it is a cpuidle driver or the default
> arch_cpu_idle(), the calling convention expects IRQs to be disabled
> on entry and enabled on exit. There is a warning in place already but
> let's add a forced IRQ enable here as well.  This will allow for
> removing the forced IRQ enable some implementations do locally and 

Why would this patch allow for removing the forced IRQ enable that are
being done on some archs in arch_cpu_idle()? Isn't this patch expecting
the default arch_cpu_idle() to have re-enabled the interrupts after
exiting from the default idle state? Its supposed to only catch faulty
cpuidle drivers that haven't enabled IRQs on exit from idle state but
are expected to have done so, isn't it?

Thanks

Regards
Preeti U Murthy

> allowing for the warning to trig.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c
> index 988573a9a3..14ca43430a 100644
> --- a/kernel/cpu/idle.c
> +++ b/kernel/cpu/idle.c
> @@ -3,6 +3,7 @@
>   */
>  #include <linux/sched.h>
>  #include <linux/cpu.h>
> +#include <linux/cpuidle.h>
>  #include <linux/tick.h>
>  #include <linux/mm.h>
>  #include <linux/stackprotector.h>
> @@ -95,8 +96,10 @@ static void cpu_idle_loop(void)
>  				if (!current_clr_polling_and_test()) {
>  					stop_critical_timings();
>  					rcu_idle_enter();
> -					arch_cpu_idle();
> -					WARN_ON_ONCE(irqs_disabled());
> +					if (cpuidle_idle_call())
> +						arch_cpu_idle();
> +					if (WARN_ON_ONCE(irqs_disabled()))
> +						local_irq_enable();
>  					rcu_idle_exit();
>  					start_critical_timings();
>  				} else {
> 

^ permalink raw reply

* Re: [PATCH] powerpc: Add cpu family documentation
From: Stephen Rothwell @ 2014-01-30  3:32 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1391049480-29346-1-git-send-email-mpe@ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

Hi Michael,

Nice.

On Thu, 30 Jan 2014 13:38:00 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> +++ b/Documentation/powerpc/cpu_families.txt
> @@ -0,0 +1,76 @@
> +CPU Families
> +============
> +
> +This doco tries to summarise some of the different cpu families that exist and
        ^^^^
document

> +   |            |
> +   |            *---- [620] --- POWER3/630 --- POWER3+ --- POWER4 --- POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- POWER7+ --- POWER8

Its a pity that this wraps ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [git pull] Please pull powerpc.git next branch
From: Benjamin Herrenschmidt @ 2014-01-30  2:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel list

Hi Linus !

Here are a few more powerpc bits for this merge window. The bulk is made
of two pull requests from Scott and Anatolij that I had missed previously
(they arrived while I was away). Since both their branches are in -next
independently, and the content has been around for a little while, they
can still go in.

The rest is mostly bug and regression fixes, a small series of
cleanups to our pseries cpuidle code (including moving it to
the right place), and one new cpuidle bakend for the powernv
platform. I also wired up the new sched_attr syscalls.

Cheers,
Ben.

The following changes since commit d891ea23d5203e5c47439b2a174f86a00b356a6c:

  Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client (2014-01-28 11:02:23 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next

for you to fetch changes up to f878f84373aefda7f041a74b24a83b8b7dec1cf0:

  powerpc: Wire up sched_setattr and sched_getattr syscalls (2014-01-29 17:13:05 +1100)

----------------------------------------------------------------
Alistair Popple (1):
      powerpc/iommu: Fix initialisation of DART iommu table

Andreas Schwab (1):
      powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations

Benjamin Herrenschmidt (3):
      Merge remote-tracking branch 'agust/next' into next
      Merge remote-tracking branch 'scott/next' into next
      powerpc: Wire up sched_setattr and sched_getattr syscalls

Deepthi Dharwar (6):
      powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.
      powerpc/pseries/cpuidle: Use cpuidle_register() for initialisation.
      powerpc/pseries/cpuidle: Make cpuidle-pseries backend driver a non-module.
      powerpc/pseries/cpuidle: Remove MAX_IDLE_STATE macro.
      powerpc/pseries/cpuidle: smt-snooze-delay cleanup.
      powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform.

Gerhard Sittig (20):
      dts: mpc512x: introduce dt-bindings/clock/ header
      dts: mpc512x: add clock related device tree specs
      clk: mpc512x: introduce COMMON_CLK for MPC512x (disabled)
      clk: mpc512x: add backwards compat to the CCF code
      dts: mpc512x: add clock specs for client lookups
      clk: mpc5xxx: switch to COMMON_CLK, retire PPC_CLOCK
      spi: mpc512x: adjust to OF based clock lookup
      serial: mpc512x: adjust for OF based clock lookup
      serial: mpc512x: setup the PSC FIFO clock as well
      USB: fsl-mph-dr-of: adjust for OF based clock lookup
      mtd: mpc5121_nfc: adjust for OF based clock lookup
      fsl-viu: adjust for OF based clock lookup
      net: can: mscan: adjust to common clock support for mpc512x
      net: can: mscan: remove non-CCF code for MPC512x
      powerpc/mpc512x: improve DIU related clock setup
      clk: mpc512x: remove migration support workarounds
      powerpc/512x: clk: minor comment updates
      powerpc/512x: clk: enforce even SDHC divider values
      powerpc/512x: clk: support MPC5121/5123/5125 SoC variants
      powerpc/512x: dts: add MPC5125 clock specs

Joe Perches (1):
      powerpc/numa: Fix decimal permissions

Li Zhong (1):
      powerpc/mm: Fix compile error of pgtable-ppc64.h

Paul Mackerras (2):
      powerpc: Fix 32-bit frames for signals delivered when transactional
      powerpc: Make sure "cache" directory is removed when offlining cpu

Scott Wood (1):
      powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E

Tang Yuantian (1):
      clk: corenet: Adds the clock binding

Tiejun Chen (1):
      powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var

jmarchan@redhat.com (1):
      powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space

 .../devicetree/bindings/clock/corenet-clock.txt    |  134 +++
 arch/powerpc/Kconfig                               |    6 +-
 arch/powerpc/boot/dts/ac14xx.dts                   |    7 +
 arch/powerpc/boot/dts/mpc5121.dtsi                 |  113 +-
 arch/powerpc/boot/dts/mpc5125twr.dts               |   53 +-
 arch/powerpc/include/asm/clk_interface.h           |   20 -
 arch/powerpc/include/asm/mpc5121.h                 |    7 +-
 arch/powerpc/include/asm/pgtable-ppc64.h           |    6 +-
 arch/powerpc/include/asm/processor.h               |    7 -
 arch/powerpc/include/asm/systbl.h                  |    2 +
 arch/powerpc/include/asm/unistd.h                  |    2 +-
 arch/powerpc/include/uapi/asm/unistd.h             |    3 +-
 arch/powerpc/kernel/Makefile                       |    1 -
 arch/powerpc/kernel/cacheinfo.c                    |    3 +
 arch/powerpc/kernel/clock.c                        |   82 --
 arch/powerpc/kernel/process.c                      |    2 +-
 arch/powerpc/kernel/signal_32.c                    |   19 +-
 arch/powerpc/kernel/sysfs.c                        |    2 -
 arch/powerpc/mm/hugetlbpage.c                      |    4 +-
 arch/powerpc/mm/numa.c                             |    2 +-
 arch/powerpc/mm/slice.c                            |    2 +-
 arch/powerpc/mm/tlb_low_64e.S                      |    3 +-
 arch/powerpc/mm/tlb_nohash.c                       |    2 +
 arch/powerpc/platforms/512x/Kconfig                |    2 +-
 arch/powerpc/platforms/512x/Makefile               |    3 +-
 arch/powerpc/platforms/512x/clock-commonclk.c      | 1221 ++++++++++++++++++++
 arch/powerpc/platforms/512x/clock.c                |  754 ------------
 arch/powerpc/platforms/512x/mpc512x_shared.c       |  169 +--
 arch/powerpc/platforms/52xx/Kconfig                |    2 +-
 arch/powerpc/platforms/powernv/setup.c             |   13 +-
 arch/powerpc/platforms/pseries/Kconfig             |    9 -
 arch/powerpc/platforms/pseries/Makefile            |    1 -
 arch/powerpc/sysdev/dart_iommu.c                   |    1 +
 drivers/cpuidle/Kconfig                            |    5 +
 drivers/cpuidle/Kconfig.powerpc                    |   20 +
 drivers/cpuidle/Makefile                           |    5 +
 drivers/cpuidle/cpuidle-powernv.c                  |  169 +++
 .../cpuidle/cpuidle-pseries.c                      |   79 +-
 drivers/media/platform/fsl-viu.c                   |    2 +-
 drivers/mtd/nand/mpc5121_nfc.c                     |    2 +-
 drivers/net/can/mscan/mpc5xxx_can.c                |  270 +++--
 drivers/spi/spi-mpc512x-psc.c                      |   26 +-
 drivers/tty/serial/mpc52xx_uart.c                  |   90 +-
 drivers/usb/host/fsl-mph-dr-of.c                   |   13 +-
 include/dt-bindings/clock/mpc512x-clock.h          |   76 ++
 include/linux/clk-provider.h                       |   16 +
 46 files changed, 2234 insertions(+), 1196 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/corenet-clock.txt
 delete mode 100644 arch/powerpc/include/asm/clk_interface.h
 delete mode 100644 arch/powerpc/kernel/clock.c
 create mode 100644 arch/powerpc/platforms/512x/clock-commonclk.c
 delete mode 100644 arch/powerpc/platforms/512x/clock.c
 create mode 100644 drivers/cpuidle/Kconfig.powerpc
 create mode 100644 drivers/cpuidle/cpuidle-powernv.c
 rename arch/powerpc/platforms/pseries/processor_idle.c => drivers/cpuidle/cpuidle-pseries.c (75%)
 create mode 100644 include/dt-bindings/clock/mpc512x-clock.h

^ permalink raw reply

* [PATCH] powerpc: Add cpu family documentation
From: Michael Ellerman @ 2014-01-30  2:38 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds some documentation on the different cpu families
supported by arch/powerpc.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 Documentation/powerpc/cpu_families.txt | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)
 create mode 100644 Documentation/powerpc/cpu_families.txt

diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt
new file mode 100644
index 0000000..df72657
--- /dev/null
+++ b/Documentation/powerpc/cpu_families.txt
@@ -0,0 +1,76 @@
+CPU Families
+============
+
+This doco tries to summarise some of the different cpu families that exist and
+are supported by arch/powerpc.
+
+Book3S (aka sPAPR)
+------------------
+
+ - Hash MMU
+ - Mix of 32 & 64 bit
+
+  Old
+ POWER --- 601 --- 603
+   |            |   |
+   |            |   *----- 740
+   |            |   |
+   |            |   *----- 750 (G3) --- 750CX --- 750CL --- 750FX
+   |            |           |
+   |            |           |
+   |           604          *--- 7400 --- 7410 --- 7450 --- 7455 --- 7447 --- 7448
+   |            |
+   |            |
+   |            *---- [620] --- POWER3/630 --- POWER3+ --- POWER4 --- POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- POWER7+ --- POWER8
+   |                 (64bit)                                                   |    .
+   |                                                                           |    .
+   |                                                                           |    *--- Cell
+   |                                                                           |
+   |                                                                           *--- 970 --- 970FX --- 970MP
+   |
+   *--- RS64 (threads)
+
+
+  PA6T (64bit) ...
+
+
+IBM BookE
+---------
+
+ - Software loaded TLB.
+ - All 32 bit
+
+  401 --- 403 --- 405 --- 440 --- 450 --- 460 --- 476
+                                   |
+                                   *--- BG/P
+
+
+Motorola/Freescale 8xx
+----------------------
+
+ - Software loaded with hardware assist.
+ - All 32 bit
+
+  8xx --- 850
+
+
+Freescale BookE
+---------------
+
+ - Software loaded TLB.
+ - e6500 adds HW loaded indirect TLB entries.
+ - Mix of 32 & 64 bit
+
+  e200 --- e500 --- e500v2 --- e500mc --- e5500 --- e6500
+                                         (Book3E)  (HW TLB)
+                                         (64bit)
+
+IBM A2 core
+-----------
+
+ - Book3E, software loaded TLB + HW loaded indirect TLB entries.
+ - 64 bit
+
+  A2 core --- BG/Q
+     |
+     *------- WSP
-- 
1.8.3.2

^ permalink raw reply related

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Benjamin Herrenschmidt @ 2014-01-30  0:35 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <1391036081.8524.80.camel@pasglop>

On Thu, 2014-01-30 at 09:54 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2014-01-29 at 17:39 +0100, Alexander Graf wrote:
> > static inline mfvtb(unsigned long)
> > {
> > #ifdef CONFIG_PPC_BOOK3S_64
> >      return mfspr(SPRN_VTB);
> > #else
> >      BUG();
> > #endif
> > }
> > 
> > is a lot easier to read and get right. But reg.h is Ben's call.
> 
> Agreed.

I mean I agree with Alex, his version is nicer :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
From: Benjamin Herrenschmidt @ 2014-01-29 22:57 UTC (permalink / raw)
  To: Greg KH; +Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, stable, Li Zhong
In-Reply-To: <20140129184544.GA23204@kroah.com>

On Wed, 2014-01-29 at 10:45 -0800, Greg KH wrote:
> On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote:
> > From: Li Zhong <zhong@linux.vnet.ibm.com>
> > 
> > It seems that forward declaration couldn't work well with typedef, use
> > struct spinlock directly to avoiding following build errors:
> > 
> > In file included from include/linux/spinlock.h:81,
> >                  from include/linux/seqlock.h:35,
> >                  from include/linux/time.h:5,
> >                  from include/uapi/linux/timex.h:56,
> >                  from include/linux/timex.h:56,
> >                  from include/linux/sched.h:17,
> >                  from arch/powerpc/kernel/asm-offsets.c:17:
> > include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
> > /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
> > 
> > build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> > for 3.13 stable series
> 
> I don't understand, why is this needed?  Is there a corrisponding patch
> upstream that already does this?  What went wrong with a "normal"
> backport of the patch to 3.13?

There's a corresponding patch in powerpc-next that I'm about to send to
Linus today, but for the backport, the "fix" could be folded into the
original offending patch.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Benjamin Herrenschmidt @ 2014-01-29 22:54 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, kvm-ppc, kvm
In-Reply-To: <52E92EC6.2030607@suse.de>

On Wed, 2014-01-29 at 17:39 +0100, Alexander Graf wrote:
> static inline mfvtb(unsigned long)
> {
> #ifdef CONFIG_PPC_BOOK3S_64
>      return mfspr(SPRN_VTB);
> #else
>      BUG();
> #endif
> }
> 
> is a lot easier to read and get right. But reg.h is Ben's call.

Agreed.

> Also could you please give me a pointer to the specification for it? I 
> tried to look up vtb in the 2.06 ISA and couldn't find it. Is it a CPU 
> specific register?

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Nishanth Aravamudan @ 2014-01-29 22:36 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Han Pingtian, mpm, penberg, linux-mm, cody, paulus,
	Anton Blanchard, David Rientjes, Christoph Lameter, linuxppc-dev,
	Wanpeng Li
In-Reply-To: <20140128182947.GA1591@linux.vnet.ibm.com>

On 28.01.2014 [10:29:47 -0800], Nishanth Aravamudan wrote:
> On 27.01.2014 [14:58:05 +0900], Joonsoo Kim wrote:
> > On Fri, Jan 24, 2014 at 05:10:42PM -0800, Nishanth Aravamudan wrote:
> > > On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
> > > > On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> > > > 
> > > > > Thank you for clarifying and providing  a test patch. I ran with this on
> > > > > the system showing the original problem, configured to have 15GB of
> > > > > memory.
> > > > > 
> > > > > With your patch after boot:
> > > > > 
> > > > > MemTotal:       15604736 kB
> > > > > MemFree:         8768192 kB
> > > > > Slab:            3882560 kB
> > > > > SReclaimable:     105408 kB
> > > > > SUnreclaim:      3777152 kB
> > > > > 
> > > > > With Anton's patch after boot:
> > > > > 
> > > > > MemTotal:       15604736 kB
> > > > > MemFree:        11195008 kB
> > > > > Slab:            1427968 kB
> > > > > SReclaimable:     109184 kB
> > > > > SUnreclaim:      1318784 kB
> > > > > 
> > > > > 
> > > > > I know that's fairly unscientific, but the numbers are reproducible. 
> > > > > 
> > 
> > Hello,
> > 
> > I think that there is one mistake on David's patch although I'm not sure
> > that it is the reason for this result.
> > 
> > With David's patch, get_partial() in new_slab_objects() doesn't work
> > properly, because we only change node id in !node_match() case. If we
> > meet just !freelist case, we pass node id directly to
> > new_slab_objects(), so we always try to allocate new slab page
> > regardless existence of partial pages. We should solve it.
> > 
> > Could you try this one?
> 
> This helps about the same as David's patch -- but I found the reason
> why! ppc64 doesn't set CONFIG_HAVE_MEMORYLESS_NODES :) Expect a patch
> shortly for that and one other case I found.
> 
> This patch on its own seems to help on our test system by saving around
> 1.5GB of slab.
> 
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> 
> with the caveat below.
> 
> Thanks,
> Nish
> 
> > 
> > Thanks.
> > 
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1698,8 +1698,10 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >                 struct kmem_cache_cpu *c)
> >  {
> >         void *object;
> > -       int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > +       int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > 
> > +       if (node != NUMA_NO_NODE && !node_present_pages(node))
> > +               searchnode = numa_mem_id();
> 
> This might be clearer as:
> 
> int searchnode = node;
> if (node == NUMA_NO_NODE || !node_present_pages(node))
> 	searchnode = numa_mem_id();

Cody Schafer mentioned to me on IRC that this may not always reflect
exactly what the caller intends.

int searchnode = node;
if (node == NUMA_NO_NODE)
	searchnode = numa_mem_id();
if (!node_present_pages(node))
	searchnode = local_memory_node(node);

The difference in semantics from the previous is that here, if we have a
memoryless node, rather than using the CPU's nearest NUMA node, we use
the NUMA node closest to the requested one?

> >         object = get_partial_node(s, get_node(s, searchnode), c, flags);
> >         if (object || node != NUMA_NO_NODE)
> >                 return object;
> > @@ -2278,10 +2280,14 @@ redo:
> > 
> >         if (unlikely(!node_match(page, node))) {
> >                 stat(s, ALLOC_NODE_MISMATCH);
> > -               deactivate_slab(s, page, c->freelist);
> > -               c->page = NULL;
> > -               c->freelist = NULL;
> > -               goto new_slab;
> > +               if (unlikely(!node_present_pages(node)))
> > +                       node = numa_mem_id();

Similarly here?

-Nish

> > +               if (!node_match(page, node)) {
> > +                       deactivate_slab(s, page, c->freelist);
> > +                       c->page = NULL;
> > +                       c->freelist = NULL;
> > +                       goto new_slab;
> > +               }
> >         }
> > 
> >         /*
> > 

^ permalink raw reply

* Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
From: Greg KH @ 2014-01-29 18:45 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, stable, Li Zhong
In-Reply-To: <1390911762-5659-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote:
> From: Li Zhong <zhong@linux.vnet.ibm.com>
> 
> It seems that forward declaration couldn't work well with typedef, use
> struct spinlock directly to avoiding following build errors:
> 
> In file included from include/linux/spinlock.h:81,
>                  from include/linux/seqlock.h:35,
>                  from include/linux/time.h:5,
>                  from include/uapi/linux/timex.h:56,
>                  from include/linux/timex.h:56,
>                  from include/linux/sched.h:17,
>                  from arch/powerpc/kernel/asm-offsets.c:17:
> include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
> /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
> 
> build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> for 3.13 stable series

I don't understand, why is this needed?  Is there a corrisponding patch
upstream that already does this?  What went wrong with a "normal"
backport of the patch to 3.13?

confused,

greg k-h

^ permalink raw reply

* Re: Please pull 'next' branch of 5xxx tree
From: Anatolij Gustschin @ 2014-01-29 20:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1390981569.8524.53.camel@pasglop>

Hi Ben !

On Wed, 29 Jan 2014 18:46:09 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
...
> Hrm, I get a merge conflicts with spi-mpc512x-psc.c, please check that I
> fixed it up properly in powerpc-next and let me know.

your fix is correct.

Thanks,

Anatolij

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Nicolas Pitre @ 2014-01-29 20:31 UTC (permalink / raw)
  To: Olof Johansson, Russell King, Benjamin Herrenschmidt,
	Preeti U Murthy, Paul Mundt, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Rafael J. Wysocki, Daniel Lezcano
  Cc: linaro-kernel, linux-sh, linux-pm, linux-kernel, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <1391017513-12995-2-git-send-email-nicolas.pitre@linaro.org>

On Wed, 29 Jan 2014, Nicolas Pitre wrote:

> In order to integrate cpuidle with the scheduler, we must have a better
> proximity in the core code with what cpuidle is doing and not delegate
> such interaction to arch code.
> 
> Architectures implementing arch_cpu_idle() should simply enter
> a cheap idle mode in the absence of a proper cpuidle driver.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
version of this patch adding the safety local_irq_enable() to the core 
code.

----- >8

From: Nicolas Pitre <nicolas.pitre@linaro.org>
Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better
proximity in the core code with what cpuidle is doing and not delegate
such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter
a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default
arch_cpu_idle(), the calling convention expects IRQs to be disabled
on entry and enabled on exit. There is a warning in place already but
let's add a forced IRQ enable here as well.  This will allow for
removing the forced IRQ enable some implementations do locally and 
allowing for the warning to trig.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c
index 988573a9a3..14ca43430a 100644
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -3,6 +3,7 @@
  */
 #include <linux/sched.h>
 #include <linux/cpu.h>
+#include <linux/cpuidle.h>
 #include <linux/tick.h>
 #include <linux/mm.h>
 #include <linux/stackprotector.h>
@@ -95,8 +96,10 @@ static void cpu_idle_loop(void)
 				if (!current_clr_polling_and_test()) {
 					stop_critical_timings();
 					rcu_idle_enter();
-					arch_cpu_idle();
-					WARN_ON_ONCE(irqs_disabled());
+					if (cpuidle_idle_call())
+						arch_cpu_idle();
+					if (WARN_ON_ONCE(irqs_disabled()))
+						local_irq_enable();
 					rcu_idle_exit();
 					start_critical_timings();
 				} else {

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox