LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Preeti U Murthy @ 2014-01-30  3:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401291526320.1652@knanqh.ubzr>

Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
> 
>> In order to integrate cpuidle with the scheduler, we must have a better
>> proximity in the core code with what cpuidle is doing and not delegate
>> such interaction to arch code.
>>
>> Architectures implementing arch_cpu_idle() should simply enter
>> a cheap idle mode in the absence of a proper cpuidle driver.
>>
>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> 
> As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
> version of this patch adding the safety local_irq_enable() to the core 
> code.
> 
> ----- >8
> 
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Subject: idle: move the cpuidle entry point to the generic idle loop
> 
> In order to integrate cpuidle with the scheduler, we must have a better
> proximity in the core code with what cpuidle is doing and not delegate
> such interaction to arch code.
> 
> Architectures implementing arch_cpu_idle() should simply enter
> a cheap idle mode in the absence of a proper cpuidle driver.
> 
> In both cases i.e. whether it is a cpuidle driver or the default
> arch_cpu_idle(), the calling convention expects IRQs to be disabled
> on entry and enabled on exit. There is a warning in place already but
> let's add a forced IRQ enable here as well.  This will allow for
> removing the forced IRQ enable some implementations do locally and 

Why would this patch allow for removing the forced IRQ enable that are
being done on some archs in arch_cpu_idle()? Isn't this patch expecting
the default arch_cpu_idle() to have re-enabled the interrupts after
exiting from the default idle state? Its supposed to only catch faulty
cpuidle drivers that haven't enabled IRQs on exit from idle state but
are expected to have done so, isn't it?

Thanks

Regards
Preeti U Murthy

> allowing for the warning to trig.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c
> index 988573a9a3..14ca43430a 100644
> --- a/kernel/cpu/idle.c
> +++ b/kernel/cpu/idle.c
> @@ -3,6 +3,7 @@
>   */
>  #include <linux/sched.h>
>  #include <linux/cpu.h>
> +#include <linux/cpuidle.h>
>  #include <linux/tick.h>
>  #include <linux/mm.h>
>  #include <linux/stackprotector.h>
> @@ -95,8 +96,10 @@ static void cpu_idle_loop(void)
>  				if (!current_clr_polling_and_test()) {
>  					stop_critical_timings();
>  					rcu_idle_enter();
> -					arch_cpu_idle();
> -					WARN_ON_ONCE(irqs_disabled());
> +					if (cpuidle_idle_call())
> +						arch_cpu_idle();
> +					if (WARN_ON_ONCE(irqs_disabled()))
> +						local_irq_enable();
>  					rcu_idle_exit();
>  					start_critical_timings();
>  				} else {
> 

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Nicolas Pitre @ 2014-01-30  5:28 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <52E9C946.50704@linux.vnet.ibm.com>

On Thu, 30 Jan 2014, Preeti U Murthy wrote:

> Hi Nicolas,
> 
> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
> > On Wed, 29 Jan 2014, Nicolas Pitre wrote:
> > 
> >> In order to integrate cpuidle with the scheduler, we must have a better
> >> proximity in the core code with what cpuidle is doing and not delegate
> >> such interaction to arch code.
> >>
> >> Architectures implementing arch_cpu_idle() should simply enter
> >> a cheap idle mode in the absence of a proper cpuidle driver.
> >>
> >> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> >> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
> > version of this patch adding the safety local_irq_enable() to the core 
> > code.
> > 
> > ----- >8
> > 
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Subject: idle: move the cpuidle entry point to the generic idle loop
> > 
> > In order to integrate cpuidle with the scheduler, we must have a better
> > proximity in the core code with what cpuidle is doing and not delegate
> > such interaction to arch code.
> > 
> > Architectures implementing arch_cpu_idle() should simply enter
> > a cheap idle mode in the absence of a proper cpuidle driver.
> > 
> > In both cases i.e. whether it is a cpuidle driver or the default
> > arch_cpu_idle(), the calling convention expects IRQs to be disabled
> > on entry and enabled on exit. There is a warning in place already but
> > let's add a forced IRQ enable here as well.  This will allow for
> > removing the forced IRQ enable some implementations do locally and 
> 
> Why would this patch allow for removing the forced IRQ enable that are
> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
> the default arch_cpu_idle() to have re-enabled the interrupts after
> exiting from the default idle state? Its supposed to only catch faulty
> cpuidle drivers that haven't enabled IRQs on exit from idle state but
> are expected to have done so, isn't it?

Exact.  However x86 currently does this:

	if (cpuidle_idle_call())
	        x86_idle();
	else
	        local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are 
unconditionally enabled whether or not the underlying cpuidle driver has 
properly done it or not.  And the reason is that some of the x86 cpuidle 
do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the 
core issue a warning instead (as well as enabling IRQs to allow the 
system to run).


Nicolas

^ permalink raw reply

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Paul Mackerras @ 2014-01-30  5:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1390927455-3312-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
> virtual time base register is a per vm register and need to saved
> and restored on vm exit and entry. Writing to VTB is not allowed
> in the privileged mode.
...

> +#ifdef CONFIG_PPC_BOOK3S_64
> +#define mfvtb()		({unsigned long rval;				\
> +			asm volatile("mfspr %0, %1" :			\
> +				     "=r" (rval) : "i" (SPRN_VTB)); rval;})

The mfspr will be a no-op on anything before POWER8, meaning the
result will be whatever value was in the destination GPR before the
mfspr.  I suppose that may not matter if the result is only ever used
when we're running on a POWER8 host, but I would feel more comfortable
if we had explicit feature tests to make sure of that, rather than
possibly doing computations with unpredictable values.

With your patch, a guest on a POWER7 or a PPC970 could do a read from
VTB and get garbage -- first, there is nothing to stop userspace from
requesting POWER8 emulation on an older machine, and secondly, even if
the virtual machine is a PPC970 (say) you don't implement
unimplemented SPR semantics for VTB (no-op if PR=0, illegal
instruction interrupt if PR=1).

On the whole I think it is reasonable to reject an attempt to set the
virtual PVR to a POWER8 PVR value if we are not running on a POWER8
host, because emulating all the new POWER8 features in software
(particularly transactional memory) would not be feasible.  Alex may
disagree. :)

Paul.

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Preeti U Murthy @ 2014-01-30  5:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	linux-sh, Olof Johansson, Thomas Gleixner, linuxppc-dev,
	Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401300021530.1652@knanqh.ubzr>

Hi Nicolas,

On 01/30/2014 10:58 AM, Nicolas Pitre wrote:
> On Thu, 30 Jan 2014, Preeti U Murthy wrote:
> 
>> Hi Nicolas,
>>
>> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
>>> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
>>>
>>>> In order to integrate cpuidle with the scheduler, we must have a better
>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>> such interaction to arch code.
>>>>
>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>
>>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>
>>> As mentioned in my reply to Olof's comment on patch #5/6, here's a new 
>>> version of this patch adding the safety local_irq_enable() to the core 
>>> code.
>>>
>>> ----- >8
>>>
>>> From: Nicolas Pitre <nicolas.pitre@linaro.org>
>>> Subject: idle: move the cpuidle entry point to the generic idle loop
>>>
>>> In order to integrate cpuidle with the scheduler, we must have a better
>>> proximity in the core code with what cpuidle is doing and not delegate
>>> such interaction to arch code.
>>>
>>> Architectures implementing arch_cpu_idle() should simply enter
>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>
>>> In both cases i.e. whether it is a cpuidle driver or the default
>>> arch_cpu_idle(), the calling convention expects IRQs to be disabled
>>> on entry and enabled on exit. There is a warning in place already but
>>> let's add a forced IRQ enable here as well.  This will allow for
>>> removing the forced IRQ enable some implementations do locally and 
>>
>> Why would this patch allow for removing the forced IRQ enable that are
>> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
>> the default arch_cpu_idle() to have re-enabled the interrupts after
>> exiting from the default idle state? Its supposed to only catch faulty
>> cpuidle drivers that haven't enabled IRQs on exit from idle state but
>> are expected to have done so, isn't it?
> 
> Exact.  However x86 currently does this:
> 
> 	if (cpuidle_idle_call())
> 	        x86_idle();
> 	else
> 	        local_irq_enable();
> 
> So whenever cpuidle_idle_call() is successful then IRQs are 
> unconditionally enabled whether or not the underlying cpuidle driver has 
> properly done it or not.  And the reason is that some of the x86 cpuidle 
> do fail to enable IRQs before returning.
> 
> So the idea is to get rid of this unconditional IRQ enabling and let the 
> core issue a warning instead (as well as enabling IRQs to allow the 
> system to run).

Oh ok, thank you for clarifying this:)

Regards
Preeti U Murthy
> 
> 
> Nicolas
> 

^ permalink raw reply

* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Paul Mackerras @ 2014-01-30  6:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <1390927455-3312-8-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
> We allow priv-mode update of this. The guest value is saved in fscr,
> and the value actually used is saved in shadow_fscr. shadow_fscr
> only contains values that are allowed by the host. On
> facility unavailable interrupt, if the facility is allowed by fscr
> but disabled in shadow_fscr we need to emulate the support. Currently
> all but EBB is disabled. We still don't support performance monitoring
> in PR guest.

...

> +	/*
> +	 * Save the current fscr in shadow fscr
> +	 */
> +	mfspr r3,SPRN_FSCR
> +	PPC_STL r3, VCPU_SHADOW_FSCR(r7)

I don't think you need to do this.  What could possibly have changed
FSCR since we loaded it on the way into the guest?

Paul.

^ permalink raw reply

* Re: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()
From: Peter Zijlstra @ 2014-01-30  9:24 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel@lists.linaro.org, Russell King, Linux-sh list,
	linux-pm, Daniel Lezcano, Rafael J. Wysocki,
	linux-kernel@vger.kernel.org, Preeti U Murthy, Paul Mundt,
	Olof Johansson, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <alpine.LFD.2.11.1401291434120.1652@knanqh.ubzr>

On Wed, Jan 29, 2014 at 03:14:40PM -0500, Nicolas Pitre wrote:
> Looking into some cpuidle drivers for x86 I found at least one that 
> doesn't respect this convention.  Damn.

Which one? We should probably fix it :-)

^ permalink raw reply

* Re: [PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler
From: Peter Zijlstra @ 2014-01-30  9:28 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <1391017513-12995-1-git-send-email-nicolas.pitre@linaro.org>

On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:
> As everyone should know by now, we want to integrate the cpuidle
> governor with the scheduler for a more efficient idling of CPUs.
> In order to help the transition, this small patch series moves the
> existing interaction with cpuidle from architecture code to generic
> core code.  The ARM, PPC, SH and X86 architectures are concerned.
> No functional change should have occurred yet.
> 
> @peterz: Are you willing to pick up those patches?

Yeah.. no objections. Should I pick these up or will you be sending
another round?

^ permalink raw reply

* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Alexander Graf @ 2014-01-30 10:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev@lists.ozlabs.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <20140130060000.GB10611@iris.ozlabs.ibm.com>



> Am 30.01.2014 um 07:00 schrieb Paul Mackerras <paulus@samba.org>:
>=20
>> On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
>> We allow priv-mode update of this. The guest value is saved in fscr,
>> and the value actually used is saved in shadow_fscr. shadow_fscr
>> only contains values that are allowed by the host. On
>> facility unavailable interrupt, if the facility is allowed by fscr
>> but disabled in shadow_fscr we need to emulate the support. Currently
>> all but EBB is disabled. We still don't support performance monitoring
>> in PR guest.
>=20
> ...
>=20
>> +    /*
>> +     * Save the current fscr in shadow fscr
>> +     */
>> +    mfspr r3,SPRN_FSCR
>> +    PPC_STL r3, VCPU_SHADOW_FSCR(r7)
>=20
> I don't think you need to do this.  What could possibly have changed
> FSCR since we loaded it on the way into the guest?

The interrupt cause is part of fscr. But yes, we only meed to store that on a=
n fscr interrupt.

Do we use anything from fscr inside the kernel? Could we switch it lazily on=
 vcpu_load/put?

Alex

>=20
> Paul.

^ permalink raw reply

* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Alexander Graf @ 2014-01-30 10:04 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev@lists.ozlabs.org, Aneesh Kumar K.V,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
In-Reply-To: <20140130054913.GA10611@iris.ozlabs.ibm.com>



> Am 30.01.2014 um 06:49 schrieb Paul Mackerras <paulus@samba.org>:
>=20
>> On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
>> virtual time base register is a per vm register and need to saved
>> and restored on vm exit and entry. Writing to VTB is not allowed
>> in the privileged mode.
> ...
>=20
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +#define mfvtb()        ({unsigned long rval;                \
>> +            asm volatile("mfspr %0, %1" :            \
>> +                     "=3Dr" (rval) : "i" (SPRN_VTB)); rval;})
>=20
> The mfspr will be a no-op on anything before POWER8, meaning the
> result will be whatever value was in the destination GPR before the
> mfspr.  I suppose that may not matter if the result is only ever used
> when we're running on a POWER8 host, but I would feel more comfortable
> if we had explicit feature tests to make sure of that, rather than
> possibly doing computations with unpredictable values.
>=20
> With your patch, a guest on a POWER7 or a PPC970 could do a read from
> VTB and get garbage -- first, there is nothing to stop userspace from
> requesting POWER8 emulation on an older machine, and secondly, even if
> the virtual machine is a PPC970 (say) you don't implement
> unimplemented SPR semantics for VTB (no-op if PR=3D0, illegal
> instruction interrupt if PR=3D1).
>=20
> On the whole I think it is reasonable to reject an attempt to set the
> virtual PVR to a POWER8 PVR value if we are not running on a POWER8
> host, because emulating all the new POWER8 features in software
> (particularly transactional memory) would not be feasible.  Alex may
> disagree. :)

We don't have a good feature flag indicator that tells kvm what the guest cp=
u is capable of. So yes, I think it's reasonable to just not expose p8 regis=
ters on p8 for now.

In theory it's of course possible to emulate a lot of p8 features on pre-p8 h=
ardware, but I'm not sure it's worth the effort. If anyone wants to spend th=
e time to work on it I'd be happy to tale patches though ;)

Alex

>=20
> Paul.

^ permalink raw reply

* PCIe Access - achieve bursts without DMA
From: Moese, Michael @ 2014-01-30 12:20 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

Hello PPC-developers,
I'm currently trying to benchmark access speeds to our PCIe-connected IP-co=
res
located inside our FPGA. On x86-based systems I was able to achieve bursts =
for
both read and write access. On PPC32, using an e500v2, I had no success at =
all=20
so far.=20
I tried using ioremap_wc(), like I did on x86, for writing, and it only res=
ults in my
writes just being single requests, one after another.
For reads, I noticed I could not ioremap_cache() on PPC, so I used simple i=
oremap()
here.=20
I used several ways to read from the device, from simple readl(),memcpy_fro=
m_io(),=20
memcpy()  to cacheable_memcpy() - with no improvements.  Even when just iss=
uing
a batch of prefetch()-calls for all the memory to read did not result in re=
ad bursts.

I only get really poor results, writing is possible with around 40 MiByte/s=
, whereas I =20
can read at about only 3 MiByte/s.
After hours of studying the reference manual from freescale, looking into o=
ther code
and searching the web, I'm close to resignation.

Maybe someone of you has some more directions for me, I'd appreciate every =
hint
that leads me to my problem's solution - maybe I just missed something or l=
ack=20
knowledge about this architecture in general.

Thanks for your reading.


Michael

^ permalink raw reply

* Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
From: Greg KH @ 2014-01-30 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, paulus, Aneesh Kumar K.V, stable, Li Zhong
In-Reply-To: <1391036256.8524.81.camel@pasglop>

On Thu, Jan 30, 2014 at 09:57:36AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2014-01-29 at 10:45 -0800, Greg KH wrote:
> > On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote:
> > > From: Li Zhong <zhong@linux.vnet.ibm.com>
> > > 
> > > It seems that forward declaration couldn't work well with typedef, use
> > > struct spinlock directly to avoiding following build errors:
> > > 
> > > In file included from include/linux/spinlock.h:81,
> > >                  from include/linux/seqlock.h:35,
> > >                  from include/linux/time.h:5,
> > >                  from include/uapi/linux/timex.h:56,
> > >                  from include/linux/timex.h:56,
> > >                  from include/linux/sched.h:17,
> > >                  from arch/powerpc/kernel/asm-offsets.c:17:
> > > include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
> > > /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
> > > 
> > > build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
> > > for 3.13 stable series
> > 
> > I don't understand, why is this needed?  Is there a corrisponding patch
> > upstream that already does this?  What went wrong with a "normal"
> > backport of the patch to 3.13?
> 
> There's a corresponding patch in powerpc-next that I'm about to send to
> Linus today, but for the backport, the "fix" could be folded into the
> original offending patch.

Oh come on, you know better than to try to send me a patch that isn't in
Linus's tree already.  Crap, I can't take that at all.

Send me the git commit id when it is in Linus's tree, otherwise I'm not
taking it.

And no, don't "fold in" anything, that's not ok either.  I'll just go
drop this patch entirely from all of my -stable trees for now.  Feel
free to resend them when all of the needed stuff is upstream.

greg k-h

^ permalink raw reply

* [PATCH] powerpc/eeh: drop taken reference to driver on eeh_rmv_device
From: Thadeu Lima de Souza Cascardo @ 2014-01-30 13:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: paulus, shangw, Thadeu Lima de Souza Cascardo

Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use
partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
which may grab a reference to a driver, but not release it.

That prevents a driver from being removed after it has gone through EEH
recovery.

This patch drops the reference in either exit path if it was taken.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7bb30dc..afe7337 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -364,7 +364,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
 		return NULL;
 	driver = eeh_pcid_get(dev);
 	if (driver && driver->err_handler)
-		return NULL;
+		goto out;
 
 	/* Remove it from PCI subsystem */
 	pr_debug("EEH: Removing %s without EEH sensitive driver\n",
@@ -377,6 +377,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	pci_stop_and_remove_bus_device(dev);
 	pci_unlock_rescan_remove();
 
+out:
+	if (driver)
+		eeh_pcid_put(dev);
 	return NULL;
 }
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler
From: Nicolas Pitre @ 2014-01-30 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <20140130092835.GL11314@laptop.programming.kicks-ass.net>

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

> On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:
> > As everyone should know by now, we want to integrate the cpuidle
> > governor with the scheduler for a more efficient idling of CPUs.
> > In order to help the transition, this small patch series moves the
> > existing interaction with cpuidle from architecture code to generic
> > core code.  The ARM, PPC, SH and X86 architectures are concerned.
> > No functional change should have occurred yet.
> > 
> > @peterz: Are you willing to pick up those patches?
> 
> Yeah.. no objections. Should I pick these up or will you be sending
> another round?

I think you could pick them now, taking care of picking up the amended 
#1/6.


Nicolas

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Daniel Lezcano @ 2014-01-30 13:44 UTC (permalink / raw)
  To: Nicolas Pitre, Preeti U Murthy
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra, linux-sh,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Olof Johansson,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401300021530.1652@knanqh.ubzr>

On 01/30/2014 06:28 AM, Nicolas Pitre wrote:
> On Thu, 30 Jan 2014, Preeti U Murthy wrote:
>
>> Hi Nicolas,
>>
>> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
>>> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
>>>
>>>> In order to integrate cpuidle with the scheduler, we must have a better
>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>> such interaction to arch code.
>>>>
>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>
>>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>
>>> As mentioned in my reply to Olof's comment on patch #5/6, here's a new
>>> version of this patch adding the safety local_irq_enable() to the core
>>> code.
>>>
>>> ----- >8
>>>
>>> From: Nicolas Pitre <nicolas.pitre@linaro.org>
>>> Subject: idle: move the cpuidle entry point to the generic idle loop
>>>
>>> In order to integrate cpuidle with the scheduler, we must have a better
>>> proximity in the core code with what cpuidle is doing and not delegate
>>> such interaction to arch code.
>>>
>>> Architectures implementing arch_cpu_idle() should simply enter
>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>
>>> In both cases i.e. whether it is a cpuidle driver or the default
>>> arch_cpu_idle(), the calling convention expects IRQs to be disabled
>>> on entry and enabled on exit. There is a warning in place already but
>>> let's add a forced IRQ enable here as well.  This will allow for
>>> removing the forced IRQ enable some implementations do locally and
>>
>> Why would this patch allow for removing the forced IRQ enable that are
>> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
>> the default arch_cpu_idle() to have re-enabled the interrupts after
>> exiting from the default idle state? Its supposed to only catch faulty
>> cpuidle drivers that haven't enabled IRQs on exit from idle state but
>> are expected to have done so, isn't it?
>
> Exact.  However x86 currently does this:
>
> 	if (cpuidle_idle_call())
> 	        x86_idle();
> 	else
> 	        local_irq_enable();
>
> So whenever cpuidle_idle_call() is successful then IRQs are
> unconditionally enabled whether or not the underlying cpuidle driver has
> properly done it or not.  And the reason is that some of the x86 cpuidle
> do fail to enable IRQs before returning.
>
> So the idea is to get rid of this unconditional IRQ enabling and let the
> core issue a warning instead (as well as enabling IRQs to allow the
> system to run).

But what I don't get with your comment is the local_irq_enable is done 
from the cpuidle common framework in 'cpuidle_enter_state' it is not 
done from the arch specific backend cpuidle driver.

So the code above could be:

	if (cpuidle_idle_call())
		x86_idle();

without the else section, this local_irq_enable is pointless. Or may be 
I missed something ?


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* RE: PCIe Access - achieve bursts without DMA
From: David Laight @ 2014-01-30 14:19 UTC (permalink / raw)
  To: 'Moese, Michael', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>

>From Moese, Michael
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-=
cores
> located inside our FPGA. On x86-based systems I was able to achieve burst=
s for
> both read and write access. On PPC32, using an e500v2, I had no success a=
t all
> so far.

I'm not sure that you can.
I had to write a simple driver for the PCIe CSB bridge dma on a 83xx ppc.
I think that might be the one in the e500v2.

I don't know how fast 'normal' PCIe slaves are, but we were accessing
an Altera fpga and the latency is less than pedestrian.
I think an ISA bus can run faster!
With moderate length transfers, the throughput was more than adequate.

	David

^ permalink raw reply

* [PATCH v2] kexec/ppc64 fix device tree endianess issues for memory attributes
From: Laurent Dufour @ 2014-01-30 15:06 UTC (permalink / raw)
  To: Simon Horman, kexec; +Cc: linuxppc-dev, Anton Blanchard, Mahesh Salgaonkar

All the attributes exposed in the device tree are in Big Endian format.

This patch add the byte swap operation for some entries which were not yet
processed, including those fixed by the following kernel's patch :

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-January/114720.html

To work on PPC64 Little Endian mode, kexec now requires that the kernel's
patch mentioned above is applied on the kexecing kernel.

Tested on ppc64 LPAR (kexec/dump) and ppc64le in a Qemu/KVM guest (kexec)

Changes from v1 :
 * add processing of the following entries :
   - ibm,dynamic-reconfiguration-memory
   - chosen/linux,kernel-end
   - chosen/linux,crashkernel-base & size
   - chosen/linux,memory-limit
   - chosen/linux,htab-base & size
   - linux,tce-base & size
   - memory@/reg
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 kexec/arch/ppc64/crashdump-ppc64.c |    9 ++++---
 kexec/arch/ppc64/kexec-ppc64.c     |   44 +++++++++++++++++++++++++++---------
 kexec/fs2dt.c                      |   19 ++++++++--------
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/kexec/arch/ppc64/crashdump-ppc64.c b/kexec/arch/ppc64/crashdump-ppc64.c
index e31dd6d..c0d575d 100644
--- a/kexec/arch/ppc64/crashdump-ppc64.c
+++ b/kexec/arch/ppc64/crashdump-ppc64.c
@@ -146,12 +146,12 @@ static int get_dyn_reconf_crash_memory_ranges(void)
 			return -1;
 		}
 
-		start = ((uint64_t *)buf)[DRCONF_ADDR];
+		start = be64_to_cpu(((uint64_t *)buf)[DRCONF_ADDR]);
 		end = start + lmb_size;
 		if (start == 0 && end >= (BACKUP_SRC_END + 1))
 			start = BACKUP_SRC_END + 1;
 
-		flags = (*((uint32_t *)&buf[DRCONF_FLAGS]));
+		flags = be32_to_cpu((*((uint32_t *)&buf[DRCONF_FLAGS])));
 		/* skip this block if the reserved bit is set in flags (0x80)
 		   or if the block is not assigned to this partition (0x8) */
 		if ((flags & 0x80) || !(flags & 0x8))
@@ -252,8 +252,9 @@ static int get_crash_memory_ranges(struct memory_range **range, int *ranges)
 				goto err;
 			}
 
-			start = ((unsigned long long *)buf)[0];
-			end = start + ((unsigned long long *)buf)[1];
+			start = be64_to_cpu(((unsigned long long *)buf)[0]);
+			end = start +
+				be64_to_cpu(((unsigned long long *)buf)[1]);
 			if (start == 0 && end >= (BACKUP_SRC_END + 1))
 				start = BACKUP_SRC_END + 1;
 
diff --git a/kexec/arch/ppc64/kexec-ppc64.c b/kexec/arch/ppc64/kexec-ppc64.c
index af9112b..49b291d 100644
--- a/kexec/arch/ppc64/kexec-ppc64.c
+++ b/kexec/arch/ppc64/kexec-ppc64.c
@@ -167,7 +167,7 @@ static int get_dyn_reconf_base_ranges(void)
 	 * lmb_size, num_of_lmbs(global variables) are
 	 * initialized once here.
 	 */
-	lmb_size = ((uint64_t *)buf)[0];
+	lmb_size = be64_to_cpu(((uint64_t *)buf)[0]);
 	fclose(file);
 
 	strcpy(fname, "/proc/device-tree/");
@@ -183,7 +183,7 @@ static int get_dyn_reconf_base_ranges(void)
 		fclose(file);
 		return -1;
 	}
-	num_of_lmbs = ((unsigned int *)buf)[0];
+	num_of_lmbs = be32_to_cpu(((unsigned int *)buf)[0]);
 
 	for (i = 0; i < num_of_lmbs; i++) {
 		if ((n = fread(buf, 1, 24, file)) < 0) {
@@ -194,7 +194,7 @@ static int get_dyn_reconf_base_ranges(void)
 		if (nr_memory_ranges >= max_memory_ranges)
 			return -1;
 
-		start = ((uint64_t *)buf)[0];
+		start = be64_to_cpu(((uint64_t *)buf)[0]);
 		end = start + lmb_size;
 		add_base_memory_range(start, end);
 	}
@@ -278,8 +278,8 @@ static int get_base_ranges(void)
 				if (realloc_memory_ranges() < 0)
 					break;
 			}
-			start = ((uint64_t *)buf)[0];
-			end = start + ((uint64_t *)buf)[1];
+			start =  be64_to_cpu(((uint64_t *)buf)[0]);
+			end = start + be64_to_cpu(((uint64_t *)buf)[1]);
 			add_base_memory_range(start, end);
 			fclose(file);
 		}
@@ -363,6 +363,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 				goto error_openfile;
 			}
 			fclose(file);
+			kernel_end = be64_to_cpu(kernel_end);
 
 			/* Add kernel memory to exclude_range */
 			exclude_range[i].start = 0x0UL;
@@ -386,6 +387,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 					goto error_openfile;
 				}
 				fclose(file);
+				crash_base = be64_to_cpu(crash_base);
 
 				memset(fname, 0, sizeof(fname));
 				strcpy(fname, device_tree);
@@ -400,6 +402,8 @@ static int get_devtree_details(unsigned long kexec_flags)
 					perror(fname);
 					goto error_openfile;
 				}
+				fclose(file);
+				crash_size = be64_to_cpu(crash_size);
 
 				if (crash_base > mem_min)
 					mem_min = crash_base;
@@ -430,10 +434,14 @@ static int get_devtree_details(unsigned long kexec_flags)
 				 * fall through. On older kernel this file
 				 * is not present.
 				 */
-			} else if (fread(&memory_limit, sizeof(uint64_t), 1,
-								file) != 1) {
-				perror(fname);
-				goto error_openfile;
+			} else {
+				if (fread(&memory_limit, sizeof(uint64_t), 1,
+					  file) != 1) {
+					perror(fname);
+					goto error_openfile;
+				}
+				fclose(file);
+				memory_limit = be64_to_cpu(memory_limit);
 			}
 
 			memset(fname, 0, sizeof(fname));
@@ -454,6 +462,9 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
+			htab_base = be64_to_cpu(htab_base);
+
 			memset(fname, 0, sizeof(fname));
 			strcpy(fname, device_tree);
 			strcat(fname, dentry->d_name);
@@ -466,6 +477,9 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
+			htab_size = be64_to_cpu(htab_size);
+
 			/* Add htab address to exclude_range - NON-LPAR only */
 			exclude_range[i].start = htab_base;
 			exclude_range[i].end = htab_base + htab_size;
@@ -492,6 +506,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 					perror(fname);
 					goto error_openfile;
 				}
+				initrd_start = be64_to_cpu(initrd_start);
 				fclose(file);
 
 				memset(fname, 0, sizeof(fname));
@@ -511,6 +526,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 					perror(fname);
 					goto error_openfile;
 				}
+				initrd_end = be64_to_cpu(initrd_end);
 				fclose(file);
 
 				/* Add initrd address to exclude_range */
@@ -532,6 +548,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
 			rtas_base = be32_to_cpu(rtas_base);
 			memset(fname, 0, sizeof(fname));
 			strcpy(fname, device_tree);
@@ -545,6 +562,7 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
 			closedir(cdir);
 			rtas_size = be32_to_cpu(rtas_size);
 			/* Add rtas to exclude_range */
@@ -568,8 +586,8 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
-			rmo_base = ((uint64_t *)buf)[0];
-			rmo_top = rmo_base + ((uint64_t *)buf)[1];
+			rmo_base = be64_to_cpu(((uint64_t *)buf)[0]);
+			rmo_top = rmo_base + be64_to_cpu(((uint64_t *)buf)[1]);
 			if (rmo_top > 0x30000000UL)
 				rmo_top = 0x30000000UL;
 
@@ -593,6 +611,8 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
+			tce_base = be64_to_cpu(tce_base);
 			memset(fname, 0, sizeof(fname));
 			strcpy(fname, device_tree);
 			strcat(fname, dentry->d_name);
@@ -605,6 +625,8 @@ static int get_devtree_details(unsigned long kexec_flags)
 				perror(fname);
 				goto error_openfile;
 			}
+			fclose(file);
+			tce_size = be32_to_cpu(tce_size);
 			/* Add tce to exclude_range - NON-LPAR only */
 			exclude_range[i].start = tce_base;
 			exclude_range[i].end = tce_base + tce_size;
diff --git a/kexec/fs2dt.c b/kexec/fs2dt.c
index 7202dc1..73c1fb9 100644
--- a/kexec/fs2dt.c
+++ b/kexec/fs2dt.c
@@ -197,7 +197,7 @@ static void add_dyn_reconf_usable_mem_property__(int fd)
 			die("unrecoverable error: error reading \"%s\": %s\n",
 				pathname, strerror(errno));
 
-		base = (uint64_t) buf[0];
+		base = be64_to_cpu((uint64_t) buf[0]);
 		end = base + lmb_size;
 		if (~0ULL - base < end)
 			die("unrecoverable error: mem property overflow\n");
@@ -229,8 +229,8 @@ static void add_dyn_reconf_usable_mem_property__(int fd)
 						    " ranges.\n",
 						    ranges_size*8);
 				}
-				ranges[rlen++] = loc_base;
-				ranges[rlen++] = loc_end - loc_base;
+				ranges[rlen++] = cpu_to_be64(loc_base);
+				ranges[rlen++] = cpu_to_be64(loc_end - loc_base);
 				rngs_cnt++;
 			}
 		}
@@ -255,7 +255,7 @@ static void add_dyn_reconf_usable_mem_property__(int fd)
 			}
 		} else {
 			/* Store the count of (base, size) duple */
-			ranges[tmp_indx] = rngs_cnt;
+			ranges[tmp_indx] = cpu_to_be64((uint64_t) rngs_cnt);
 		}
 	}
 		
@@ -309,10 +309,11 @@ static void add_usable_mem_property(int fd, size_t len)
 		die("unrecoverable error: error reading \"%s\": %s\n",
 		    pathname, strerror(errno));
 
-	if (~0ULL - buf[0] < buf[1])
-		die("unrecoverable error: mem property overflow\n");
 	base = be64_to_cpu(buf[0]);
-	end = base + be64_to_cpu(buf[1]);
+	end = be64_to_cpu(buf[1]);
+	if (~0ULL - base < end)
+		die("unrecoverable error: mem property overflow\n");
+	end += base;
 
 	ranges = malloc(ranges_size * sizeof(*ranges));
 	if (!ranges)
@@ -342,8 +343,8 @@ static void add_usable_mem_property(int fd, size_t len)
 					    "%d bytes for ranges.\n",
 					    ranges_size*sizeof(*ranges));
 			}
-			ranges[rlen++] = loc_base;
-			ranges[rlen++] = loc_end - loc_base;
+			ranges[rlen++] = cpu_to_be64(loc_base);
+			ranges[rlen++] = cpu_to_be64(loc_end - loc_base);
 		}
 	}
 

^ permalink raw reply related

* Re: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
From: Peter Zijlstra @ 2014-01-30 15:25 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <1391017513-12995-7-git-send-email-nicolas.pitre@linaro.org>

On Wed, Jan 29, 2014 at 12:45:13PM -0500, Nicolas Pitre wrote:
> Integration of cpuidle with the scheduler requires that the idle loop be
> closely integrated with the scheduler proper. Moving cpu/idle.c into the
> sched directory will allow for a smoother integration, and eliminate a
> subdirectory which contained only one source file.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  kernel/Makefile              | 1 -
>  kernel/cpu/Makefile          | 1 -
>  kernel/sched/Makefile        | 2 +-
>  kernel/{cpu => sched}/idle.c | 0
>  4 files changed, 1 insertion(+), 3 deletions(-)
>  delete mode 100644 kernel/cpu/Makefile
>  rename kernel/{cpu => sched}/idle.c (100%)

> --- a/kernel/sched/Makefile
> +++ b/kernel/sched/Makefile
> @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
>  CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
>  endif
>  
> -obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
> +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o
>  obj-y += wait.o completion.o
>  obj-$(CONFIG_SMP) += cpupri.o
>  obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
> diff --git a/kernel/cpu/idle.c b/kernel/sched/idle.c
> similarity index 100%
> rename from kernel/cpu/idle.c
> rename to kernel/sched/idle.c

This is not a valid patch for PATCH(1). Please try again.

^ permalink raw reply

* [RESEND PATCH] powerpc/relocate fix relocate processing in LE mode
From: Laurent Dufour @ 2014-01-30 15:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev

Relocation's code is not working in little endian mode because the r_info
field, which is a 64 bits value, should be read from the right offset.

The current code is optimized to read the r_info field as a 32 bits value
starting at the middle of the double word (offset 12). When running in LE
mode, the read value is not correct since only the MSB is read.

This patch removes this optimization which consist to deal with a 32 bits
value instead of a 64 bits one. This way it works in big and little endian
mode.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/reloc_64.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index b47a0e1..1482327 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -69,8 +69,8 @@ _GLOBAL(relocate)
 	 * R_PPC64_RELATIVE ones.
 	 */
 	mtctr	r8
-5:	lwz	r0,12(9)	/* ELF64_R_TYPE(reloc->r_info) */
-	cmpwi	r0,R_PPC64_RELATIVE
+5:	ld	r0,8(9)		/* ELF64_R_TYPE(reloc->r_info) */
+	cmpdi	r0,R_PPC64_RELATIVE
 	bne	6f
 	ld	r6,0(r9)	/* reloc->r_offset */
 	ld	r0,16(r9)	/* reloc->r_addend */

^ permalink raw reply related

* Re: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
From: Nicolas Pitre @ 2014-01-30 16:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <20140130152500.GB5002@laptop.programming.kicks-ass.net>

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

> On Wed, Jan 29, 2014 at 12:45:13PM -0500, Nicolas Pitre wrote:
> > Integration of cpuidle with the scheduler requires that the idle loop be
> > closely integrated with the scheduler proper. Moving cpu/idle.c into the
> > sched directory will allow for a smoother integration, and eliminate a
> > subdirectory which contained only one source file.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > ---
> >  kernel/Makefile              | 1 -
> >  kernel/cpu/Makefile          | 1 -
> >  kernel/sched/Makefile        | 2 +-
> >  kernel/{cpu => sched}/idle.c | 0
> >  4 files changed, 1 insertion(+), 3 deletions(-)
> >  delete mode 100644 kernel/cpu/Makefile
> >  rename kernel/{cpu => sched}/idle.c (100%)
> 
> > --- a/kernel/sched/Makefile
> > +++ b/kernel/sched/Makefile
> > @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
> >  CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
> >  endif
> >  
> > -obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
> > +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o
> >  obj-y += wait.o completion.o
> >  obj-$(CONFIG_SMP) += cpupri.o
> >  obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
> > diff --git a/kernel/cpu/idle.c b/kernel/sched/idle.c
> > similarity index 100%
> > rename from kernel/cpu/idle.c
> > rename to kernel/sched/idle.c
> 
> This is not a valid patch for PATCH(1). Please try again.

Don't you use git?  ;-)

Here's a plain patch:

----- >8

>From 1bf40eb80a44633094e94986a74bd5ffa222f9d4 Mon Sep 17 00:00:00 2001
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Sun, 26 Jan 2014 23:42:01 -0500
Subject: [PATCH] cpu/idle.c: move to sched/idle.c

Integration of cpuidle with the scheduler requires that the idle loop be
closely integrated with the scheduler proper. Moving cpu/idle.c into the
sched directory will allow for a smoother integration, and eliminate a
subdirectory which contained only one source file.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 kernel/Makefile       |   1 -
 kernel/cpu/Makefile   |   1 -
 kernel/cpu/idle.c     | 144 --------------------------------------------------
 kernel/sched/Makefile |   2 +-
 kernel/sched/idle.c   | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 147 deletions(-)
 delete mode 100644 kernel/cpu/Makefile
 delete mode 100644 kernel/cpu/idle.c
 create mode 100644 kernel/sched/idle.c

diff --git a/kernel/Makefile b/kernel/Makefile
index bc010ee272..6f1c7e5cfc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -22,7 +22,6 @@ obj-y += sched/
 obj-y += locking/
 obj-y += power/
 obj-y += printk/
-obj-y += cpu/
 obj-y += irq/
 obj-y += rcu/
 
diff --git a/kernel/cpu/Makefile b/kernel/cpu/Makefile
deleted file mode 100644
index 59ab052ef7..0000000000
--- a/kernel/cpu/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-obj-y	= idle.o
diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c
deleted file mode 100644
index 14ca43430a..0000000000
--- a/kernel/cpu/idle.c
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * Generic entry point for the idle threads
- */
-#include <linux/sched.h>
-#include <linux/cpu.h>
-#include <linux/cpuidle.h>
-#include <linux/tick.h>
-#include <linux/mm.h>
-#include <linux/stackprotector.h>
-
-#include <asm/tlb.h>
-
-#include <trace/events/power.h>
-
-static int __read_mostly cpu_idle_force_poll;
-
-void cpu_idle_poll_ctrl(bool enable)
-{
-	if (enable) {
-		cpu_idle_force_poll++;
-	} else {
-		cpu_idle_force_poll--;
-		WARN_ON_ONCE(cpu_idle_force_poll < 0);
-	}
-}
-
-#ifdef CONFIG_GENERIC_IDLE_POLL_SETUP
-static int __init cpu_idle_poll_setup(char *__unused)
-{
-	cpu_idle_force_poll = 1;
-	return 1;
-}
-__setup("nohlt", cpu_idle_poll_setup);
-
-static int __init cpu_idle_nopoll_setup(char *__unused)
-{
-	cpu_idle_force_poll = 0;
-	return 1;
-}
-__setup("hlt", cpu_idle_nopoll_setup);
-#endif
-
-static inline int cpu_idle_poll(void)
-{
-	rcu_idle_enter();
-	trace_cpu_idle_rcuidle(0, smp_processor_id());
-	local_irq_enable();
-	while (!tif_need_resched())
-		cpu_relax();
-	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
-	rcu_idle_exit();
-	return 1;
-}
-
-/* Weak implementations for optional arch specific functions */
-void __weak arch_cpu_idle_prepare(void) { }
-void __weak arch_cpu_idle_enter(void) { }
-void __weak arch_cpu_idle_exit(void) { }
-void __weak arch_cpu_idle_dead(void) { }
-void __weak arch_cpu_idle(void)
-{
-	cpu_idle_force_poll = 1;
-	local_irq_enable();
-}
-
-/*
- * Generic idle loop implementation
- */
-static void cpu_idle_loop(void)
-{
-	while (1) {
-		tick_nohz_idle_enter();
-
-		while (!need_resched()) {
-			check_pgt_cache();
-			rmb();
-
-			if (cpu_is_offline(smp_processor_id()))
-				arch_cpu_idle_dead();
-
-			local_irq_disable();
-			arch_cpu_idle_enter();
-
-			/*
-			 * In poll mode we reenable interrupts and spin.
-			 *
-			 * Also if we detected in the wakeup from idle
-			 * path that the tick broadcast device expired
-			 * for us, we don't want to go deep idle as we
-			 * know that the IPI is going to arrive right
-			 * away
-			 */
-			if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
-				cpu_idle_poll();
-			} else {
-				if (!current_clr_polling_and_test()) {
-					stop_critical_timings();
-					rcu_idle_enter();
-					if (cpuidle_idle_call())
-						arch_cpu_idle();
-					if (WARN_ON_ONCE(irqs_disabled()))
-						local_irq_enable();
-					rcu_idle_exit();
-					start_critical_timings();
-				} else {
-					local_irq_enable();
-				}
-				__current_set_polling();
-			}
-			arch_cpu_idle_exit();
-			/*
-			 * We need to test and propagate the TIF_NEED_RESCHED
-			 * bit here because we might not have send the
-			 * reschedule IPI to idle tasks.
-			 */
-			if (tif_need_resched())
-				set_preempt_need_resched();
-		}
-		tick_nohz_idle_exit();
-		schedule_preempt_disabled();
-	}
-}
-
-void cpu_startup_entry(enum cpuhp_state state)
-{
-	/*
-	 * This #ifdef needs to die, but it's too late in the cycle to
-	 * make this generic (arm and sh have never invoked the canary
-	 * init for the non boot cpus!). Will be fixed in 3.11
-	 */
-#ifdef CONFIG_X86
-	/*
-	 * If we're the non-boot CPU, nothing set the stack canary up
-	 * for us. The boot CPU already has it initialized but no harm
-	 * in doing it again. This is a good place for updating it, as
-	 * we wont ever return from this function (so the invalid
-	 * canaries already on the stack wont ever trigger).
-	 */
-	boot_init_stack_canary();
-#endif
-	__current_set_polling();
-	arch_cpu_idle_prepare();
-	cpu_idle_loop();
-}
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 7b621409cf..ac3e0ea68f 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
-obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
+obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o
 obj-y += wait.o completion.o
 obj-$(CONFIG_SMP) += cpupri.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
new file mode 100644
index 0000000000..14ca43430a
--- /dev/null
+++ b/kernel/sched/idle.c
@@ -0,0 +1,144 @@
+/*
+ * Generic entry point for the idle threads
+ */
+#include <linux/sched.h>
+#include <linux/cpu.h>
+#include <linux/cpuidle.h>
+#include <linux/tick.h>
+#include <linux/mm.h>
+#include <linux/stackprotector.h>
+
+#include <asm/tlb.h>
+
+#include <trace/events/power.h>
+
+static int __read_mostly cpu_idle_force_poll;
+
+void cpu_idle_poll_ctrl(bool enable)
+{
+	if (enable) {
+		cpu_idle_force_poll++;
+	} else {
+		cpu_idle_force_poll--;
+		WARN_ON_ONCE(cpu_idle_force_poll < 0);
+	}
+}
+
+#ifdef CONFIG_GENERIC_IDLE_POLL_SETUP
+static int __init cpu_idle_poll_setup(char *__unused)
+{
+	cpu_idle_force_poll = 1;
+	return 1;
+}
+__setup("nohlt", cpu_idle_poll_setup);
+
+static int __init cpu_idle_nopoll_setup(char *__unused)
+{
+	cpu_idle_force_poll = 0;
+	return 1;
+}
+__setup("hlt", cpu_idle_nopoll_setup);
+#endif
+
+static inline int cpu_idle_poll(void)
+{
+	rcu_idle_enter();
+	trace_cpu_idle_rcuidle(0, smp_processor_id());
+	local_irq_enable();
+	while (!tif_need_resched())
+		cpu_relax();
+	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
+	rcu_idle_exit();
+	return 1;
+}
+
+/* Weak implementations for optional arch specific functions */
+void __weak arch_cpu_idle_prepare(void) { }
+void __weak arch_cpu_idle_enter(void) { }
+void __weak arch_cpu_idle_exit(void) { }
+void __weak arch_cpu_idle_dead(void) { }
+void __weak arch_cpu_idle(void)
+{
+	cpu_idle_force_poll = 1;
+	local_irq_enable();
+}
+
+/*
+ * Generic idle loop implementation
+ */
+static void cpu_idle_loop(void)
+{
+	while (1) {
+		tick_nohz_idle_enter();
+
+		while (!need_resched()) {
+			check_pgt_cache();
+			rmb();
+
+			if (cpu_is_offline(smp_processor_id()))
+				arch_cpu_idle_dead();
+
+			local_irq_disable();
+			arch_cpu_idle_enter();
+
+			/*
+			 * In poll mode we reenable interrupts and spin.
+			 *
+			 * Also if we detected in the wakeup from idle
+			 * path that the tick broadcast device expired
+			 * for us, we don't want to go deep idle as we
+			 * know that the IPI is going to arrive right
+			 * away
+			 */
+			if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
+				cpu_idle_poll();
+			} else {
+				if (!current_clr_polling_and_test()) {
+					stop_critical_timings();
+					rcu_idle_enter();
+					if (cpuidle_idle_call())
+						arch_cpu_idle();
+					if (WARN_ON_ONCE(irqs_disabled()))
+						local_irq_enable();
+					rcu_idle_exit();
+					start_critical_timings();
+				} else {
+					local_irq_enable();
+				}
+				__current_set_polling();
+			}
+			arch_cpu_idle_exit();
+			/*
+			 * We need to test and propagate the TIF_NEED_RESCHED
+			 * bit here because we might not have send the
+			 * reschedule IPI to idle tasks.
+			 */
+			if (tif_need_resched())
+				set_preempt_need_resched();
+		}
+		tick_nohz_idle_exit();
+		schedule_preempt_disabled();
+	}
+}
+
+void cpu_startup_entry(enum cpuhp_state state)
+{
+	/*
+	 * This #ifdef needs to die, but it's too late in the cycle to
+	 * make this generic (arm and sh have never invoked the canary
+	 * init for the non boot cpus!). Will be fixed in 3.11
+	 */
+#ifdef CONFIG_X86
+	/*
+	 * If we're the non-boot CPU, nothing set the stack canary up
+	 * for us. The boot CPU already has it initialized but no harm
+	 * in doing it again. This is a good place for updating it, as
+	 * we wont ever return from this function (so the invalid
+	 * canaries already on the stack wont ever trigger).
+	 */
+	boot_init_stack_canary();
+#endif
+	__current_set_polling();
+	arch_cpu_idle_prepare();
+	cpu_idle_loop();
+}
-- 
1.8.4.108.g55ea5f6

^ permalink raw reply related

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Nicolas Pitre @ 2014-01-30 16:07 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra, linux-sh,
	Rafael J. Wysocki, linux-kernel, Olof Johansson, Paul Mundt,
	Preeti U Murthy, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel
In-Reply-To: <52EA5720.8010000@linaro.org>

On Thu, 30 Jan 2014, Daniel Lezcano wrote:

> On 01/30/2014 06:28 AM, Nicolas Pitre wrote:
> > On Thu, 30 Jan 2014, Preeti U Murthy wrote:
> >
> > > Hi Nicolas,
> > >
> > > On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
> > > > On Wed, 29 Jan 2014, Nicolas Pitre wrote:
> > > >
> > > > > In order to integrate cpuidle with the scheduler, we must have a
> > > > > better
> > > > > proximity in the core code with what cpuidle is doing and not delegate
> > > > > such interaction to arch code.
> > > > >
> > > > > Architectures implementing arch_cpu_idle() should simply enter
> > > > > a cheap idle mode in the absence of a proper cpuidle driver.
> > > > >
> > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > > > > Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > > >
> > > > As mentioned in my reply to Olof's comment on patch #5/6, here's a new
> > > > version of this patch adding the safety local_irq_enable() to the core
> > > > code.
> > > >
> > > > ----- >8
> > > >
> > > > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > > > Subject: idle: move the cpuidle entry point to the generic idle loop
> > > >
> > > > In order to integrate cpuidle with the scheduler, we must have a better
> > > > proximity in the core code with what cpuidle is doing and not delegate
> > > > such interaction to arch code.
> > > >
> > > > Architectures implementing arch_cpu_idle() should simply enter
> > > > a cheap idle mode in the absence of a proper cpuidle driver.
> > > >
> > > > In both cases i.e. whether it is a cpuidle driver or the default
> > > > arch_cpu_idle(), the calling convention expects IRQs to be disabled
> > > > on entry and enabled on exit. There is a warning in place already but
> > > > let's add a forced IRQ enable here as well.  This will allow for
> > > > removing the forced IRQ enable some implementations do locally and
> > >
> > > Why would this patch allow for removing the forced IRQ enable that are
> > > being done on some archs in arch_cpu_idle()? Isn't this patch expecting
> > > the default arch_cpu_idle() to have re-enabled the interrupts after
> > > exiting from the default idle state? Its supposed to only catch faulty
> > > cpuidle drivers that haven't enabled IRQs on exit from idle state but
> > > are expected to have done so, isn't it?
> >
> > Exact.  However x86 currently does this:
> >
> >  if (cpuidle_idle_call())
> >          x86_idle();
> >  else
> >          local_irq_enable();
> >
> > So whenever cpuidle_idle_call() is successful then IRQs are
> > unconditionally enabled whether or not the underlying cpuidle driver has
> > properly done it or not.  And the reason is that some of the x86 cpuidle
> > do fail to enable IRQs before returning.
> >
> > So the idea is to get rid of this unconditional IRQ enabling and let the
> > core issue a warning instead (as well as enabling IRQs to allow the
> > system to run).
> 
> But what I don't get with your comment is the local_irq_enable is done from
> the cpuidle common framework in 'cpuidle_enter_state' it is not done from the
> arch specific backend cpuidle driver.

Oh well... This certainly means we'll have to clean this mess as some 
drivers do it on their own while some others don't.  Some drivers also 
loop on !need_resched() while some others simply return on the first 
interrupt.

> So the code above could be:
> 
> 	if (cpuidle_idle_call())
> 		x86_idle();
> 
> without the else section, this local_irq_enable is pointless. Or may be I
> missed something ?

A later patch removes it anyway.  But if it is really necessary to 
enable interrupts then the core will do it but with a warning now.


Nicolas

^ permalink raw reply

* Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
From: Christoph Lameter @ 2014-01-30 16:26 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Han Pingtian, mpm, penberg, linux-mm, cody, paulus,
	Anton Blanchard, David Rientjes, Joonsoo Kim, linuxppc-dev,
	Wanpeng Li
In-Reply-To: <20140129223640.GA10101@linux.vnet.ibm.com>

On Wed, 29 Jan 2014, Nishanth Aravamudan wrote:

> exactly what the caller intends.
>
> int searchnode = node;
> if (node == NUMA_NO_NODE)
> 	searchnode = numa_mem_id();
> if (!node_present_pages(node))
> 	searchnode = local_memory_node(node);
>
> The difference in semantics from the previous is that here, if we have a
> memoryless node, rather than using the CPU's nearest NUMA node, we use
> the NUMA node closest to the requested one?

The idea here is that the page allocator will do the fallback to other
nodes. This check for !node_present should not be necessary. SLUB needs to
accept the page from whatever node the page allocator returned and work
with that.

The problem is the check for having a slab from the "right" node may fall
again after another attempt to allocate from the same node. SLUB will then
push the slab from the *wrong* node back to the partial lists and may
attempt another allocation that will again be successful but return memory
from another node. That way the partial lists from a particular node are
growing uselessly.

One way to solve this may be to check if memory is actually allocated
from the requested node and fallback to NUMA_NO_NODE (which will use the
last allocated slab) for future allocs if the page allocator returned
memory from a different node (unless GFP_THIS_NODE is set of course).
Otherwise we end up replicating  the page allocator logic in slub like in
slab. That is what I wanted to
avoid.

^ permalink raw reply

* Re: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
From: Peter Zijlstra @ 2014-01-30 16:27 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-sh, linux-pm, Daniel Lezcano,
	Rafael J. Wysocki, linux-kernel, Paul Mundt, Preeti U Murthy,
	Thomas Gleixner, linuxppc-dev, Ingo Molnar, linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401301102210.1652@knanqh.ubzr>

On Thu, Jan 30, 2014 at 11:03:31AM -0500, Nicolas Pitre wrote:
> > This is not a valid patch for PATCH(1). Please try again.
> 
> Don't you use git?  ;-)

Nah, git and me don't get along well.

> Here's a plain patch:

Thanks!

^ permalink raw reply

* Re: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
From: Joe Perches @ 2014-01-30 16:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, linaro-kernel, Russell King, linux-sh, linux-pm,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	Preeti U Murthy, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel
In-Reply-To: <20140130162753.GF5002@laptop.programming.kicks-ass.net>

On Thu, 2014-01-30 at 17:27 +0100, Peter Zijlstra wrote:
> On Thu, Jan 30, 2014 at 11:03:31AM -0500, Nicolas Pitre wrote:
> > > This is not a valid patch for PATCH(1). Please try again.
> > 
> > Don't you use git?  ;-)
> 
> Nah, git and me don't get along well.

Perhaps you could use a newer version of patch

http://savannah.gnu.org/forum/forum.php?forum_id=7361

GNU patch version 2.7 released

Item posted by Andreas Gruenbacher <agruen> on Wed 12 Sep 2012 02:18:14
PM UTC.

I am pleased to announce that version 2.7 of GNU patch has been
released. The following significant changes have happened since the last
stable release in December 2009: 

      * Support for most features of the "diff --git" format, including
        renames and copies, permission changes, and symlink diffs.
        Binary diffs are not supported yet; patch will complain and skip
        them.

^ permalink raw reply

* Re: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
From: Peter Zijlstra @ 2014-01-30 16:52 UTC (permalink / raw)
  To: Joe Perches
  Cc: Nicolas Pitre, linaro-kernel, Russell King, linux-sh, linux-pm,
	Daniel Lezcano, Rafael J. Wysocki, linux-kernel, Paul Mundt,
	Preeti U Murthy, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel
In-Reply-To: <1391100076.2422.56.camel@joe-AO722>

On Thu, Jan 30, 2014 at 08:41:16AM -0800, Joe Perches wrote:
> Perhaps you could use a newer version of patch
> 
> GNU patch version 2.7 released

Yeah, I know about that, I'll wait until its common in all distros,
updating all machines I use by hand is just painful.

^ permalink raw reply

* Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
From: Daniel Lezcano @ 2014-01-30 17:28 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linaro-kernel, Russell King, linux-pm, Peter Zijlstra, linux-sh,
	Rafael J. Wysocki, linux-kernel, Olof Johansson, Paul Mundt,
	Preeti U Murthy, Thomas Gleixner, linuxppc-dev, Ingo Molnar,
	linux-arm-kernel
In-Reply-To: <alpine.LFD.2.11.1401301056180.1652@knanqh.ubzr>

On 01/30/2014 05:07 PM, Nicolas Pitre wrote:
> On Thu, 30 Jan 2014, Daniel Lezcano wrote:
>
>> On 01/30/2014 06:28 AM, Nicolas Pitre wrote:
>>> On Thu, 30 Jan 2014, Preeti U Murthy wrote:
>>>
>>>> Hi Nicolas,
>>>>
>>>> On 01/30/2014 02:01 AM, Nicolas Pitre wrote:
>>>>> On Wed, 29 Jan 2014, Nicolas Pitre wrote:
>>>>>
>>>>>> In order to integrate cpuidle with the scheduler, we must have a
>>>>>> better
>>>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>>>> such interaction to arch code.
>>>>>>
>>>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>>>
>>>>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>>
>>>>> As mentioned in my reply to Olof's comment on patch #5/6, here's a new
>>>>> version of this patch adding the safety local_irq_enable() to the core
>>>>> code.
>>>>>
>>>>> ----- >8
>>>>>
>>>>> From: Nicolas Pitre <nicolas.pitre@linaro.org>
>>>>> Subject: idle: move the cpuidle entry point to the generic idle loop
>>>>>
>>>>> In order to integrate cpuidle with the scheduler, we must have a better
>>>>> proximity in the core code with what cpuidle is doing and not delegate
>>>>> such interaction to arch code.
>>>>>
>>>>> Architectures implementing arch_cpu_idle() should simply enter
>>>>> a cheap idle mode in the absence of a proper cpuidle driver.
>>>>>
>>>>> In both cases i.e. whether it is a cpuidle driver or the default
>>>>> arch_cpu_idle(), the calling convention expects IRQs to be disabled
>>>>> on entry and enabled on exit. There is a warning in place already but
>>>>> let's add a forced IRQ enable here as well.  This will allow for
>>>>> removing the forced IRQ enable some implementations do locally and
>>>>
>>>> Why would this patch allow for removing the forced IRQ enable that are
>>>> being done on some archs in arch_cpu_idle()? Isn't this patch expecting
>>>> the default arch_cpu_idle() to have re-enabled the interrupts after
>>>> exiting from the default idle state? Its supposed to only catch faulty
>>>> cpuidle drivers that haven't enabled IRQs on exit from idle state but
>>>> are expected to have done so, isn't it?
>>>
>>> Exact.  However x86 currently does this:
>>>
>>>   if (cpuidle_idle_call())
>>>           x86_idle();
>>>   else
>>>           local_irq_enable();
>>>
>>> So whenever cpuidle_idle_call() is successful then IRQs are
>>> unconditionally enabled whether or not the underlying cpuidle driver has
>>> properly done it or not.  And the reason is that some of the x86 cpuidle
>>> do fail to enable IRQs before returning.
>>>
>>> So the idea is to get rid of this unconditional IRQ enabling and let the
>>> core issue a warning instead (as well as enabling IRQs to allow the
>>> system to run).
>>
>> But what I don't get with your comment is the local_irq_enable is done from
>> the cpuidle common framework in 'cpuidle_enter_state' it is not done from the
>> arch specific backend cpuidle driver.
>
> Oh well... This certainly means we'll have to clean this mess as some
> drivers do it on their own while some others don't.  Some drivers also
> loop on !need_resched() while some others simply return on the first
> interrupt.

Ok, I think the mess is coming from 'default_idle' which does not 
re-enable the local_irq but used from different places like 
amd_e400_idle and apm_cpu_idle.

void default_idle(void)
{
         trace_cpu_idle_rcuidle(1, smp_processor_id());
         safe_halt();
         trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
}

Considering the system configured without cpuidle because this one 
*always* enable the local irq, we have the different cases:

x86_idle = default_idle();
==> local_irq_enable is missing

x86_idle = amd_e400_idle();
==> it calls local_irq_disable(); but in the idle loop context where the 
local irqs are already disabled.
==> if amd_e400_c1e_detected is true, the local_irq are enabled
==> otherwise no
==> default_idle is called from there and does not enable local_irqs


>> So the code above could be:
>>
>> 	if (cpuidle_idle_call())
>> 		x86_idle();
>>
>> without the else section, this local_irq_enable is pointless. Or may be I
>> missed something ?
>
> A later patch removes it anyway.  But if it is really necessary to
> enable interrupts then the core will do it but with a warning now.

This WARN should disappear. It was there because it was up to the 
backend cpuidle driver to enable the irq. But in the meantime, that was 
consolidated into a single place in the cpuidle framework so no need to 
try to catch errors.

What about (based on this patchset).

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4505e2a..2d60cbb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -299,6 +299,7 @@ void arch_cpu_idle_dead(void)
  void arch_cpu_idle(void)
  {
         x86_idle();
+       local_irq_enable();
  }

  /*



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox