LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 3/3 v4] powerpc/mpic: FSL MPIC error interrupt support.
From: Kumar Gala @ 2012-08-06 15:52 UTC (permalink / raw)
  To: Varun Sethi; +Cc: Bogdan Hamciuc, linuxppc-dev
In-Reply-To: <1344257045-11200-1-git-send-email-Varun.Sethi@freescale.com>


On Aug 6, 2012, at 7:44 AM, Varun Sethi wrote:

> All SOC device error interrupts are muxed and delivered to the core
> as a single MPIC error interrupt. Currently all the device drivers
> requiring access to device errors have to register for the MPIC error
> interrupt as a shared interrupt.
>=20
> With this patch we add interrupt demuxing capability in the mpic =
driver,
> allowing device drivers to register for their individual error =
interrupts.
> This is achieved by handling error interrupts in a cascaded fashion.
>=20
> MPIC error interrupt is handled by the "error_int_handler", which
> subsequently demuxes it using the EISR and delivers it to the =
respective
> drivers.=20
>=20
> The error interrupt capability is dependent on the MPIC EIMR register,
> which was introduced in FSL MPIC version 4.1 (P4080 rev2). So, error
> interrupt demuxing capability is dependent on the MPIC version and can
> be used for versions >=3D 4.1.
>=20
> Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
> Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
> [In the initial version of the patch we were using handle_simple_irq
> as the handler for cascaded error interrupts, this resulted
> in issues in case of threaded isrs (with RT kernel). This issue was
> debugged by Bogdan and decision was taken to use the handle_level_irq
> handler]
> ---
> arch/powerpc/include/asm/mpic.h    |   16 ++++
> arch/powerpc/sysdev/Makefile       |    2 +-
> arch/powerpc/sysdev/fsl_mpic_err.c |  153 =
++++++++++++++++++++++++++++++++++++
> arch/powerpc/sysdev/mpic.c         |   45 ++++++++++-
> arch/powerpc/sysdev/mpic.h         |   22 +++++
> 5 files changed, 236 insertions(+), 2 deletions(-)
> create mode 100644 arch/powerpc/sysdev/fsl_mpic_err.c
>=20
> diff --git a/arch/powerpc/include/asm/mpic.h =
b/arch/powerpc/include/asm/mpic.h
> index e14d35d..6c8e53b 100644
> --- a/arch/powerpc/include/asm/mpic.h
> +++ b/arch/powerpc/include/asm/mpic.h
> @@ -118,6 +118,9 @@
> #define MPIC_MAX_CPUS		32
> #define MPIC_MAX_ISU		32
>=20
> +#define MPIC_MAX_ERR      32
> +#define MPIC_FSL_ERR_INT  16
> +
> /*
>  * Tsi108 implementation of MPIC has many differences from the =
original one
>  */
> @@ -270,6 +273,7 @@ struct mpic
> 	struct irq_chip		hc_ipi;
> #endif
> 	struct irq_chip		hc_tm;
> +	struct irq_chip		hc_err;
> 	const char		*name;
> 	/* Flags */
> 	unsigned int		flags;
> @@ -283,6 +287,8 @@ struct mpic
> 	/* vector numbers used for internal sources (ipi/timers) */
> 	unsigned int		ipi_vecs[4];
> 	unsigned int		timer_vecs[8];
> +	/* vector numbers used for FSL MPIC error interrupts */
> +	unsigned int		err_int_vecs[MPIC_MAX_ERR];
>=20
> 	/* Spurious vector to program into unused sources */
> 	unsigned int		spurious_vec;
> @@ -306,6 +312,11 @@ struct mpic
> 	struct mpic_reg_bank	cpuregs[MPIC_MAX_CPUS];
> 	struct mpic_reg_bank	isus[MPIC_MAX_ISU];
>=20
> +	/* ioremap'ed base for error interrupt registers */
> +	u32 __iomem	*err_regs;
> +	/* error interrupt config */
> +	u32			err_int_config_done;

I thought we were going to remove this as it don't really provide any =
value.

> +
> 	/* Protected sources */
> 	unsigned long		*protected;
>=20
> @@ -370,6 +381,11 @@ struct mpic
> #define MPIC_NO_RESET			0x00004000
> /* Freescale MPIC (compatible includes "fsl,mpic") */
> #define MPIC_FSL			0x00008000
> +/* Freescale MPIC supports EIMR (error interrupt mask register).
> + * This flag is set for MPIC version >=3D 4.1 (version determined
> + * from the BRR1 register).
> +*/
> +#define MPIC_FSL_HAS_EIMR		0x00010000
>=20
> /* MPIC HW modification ID */
> #define MPIC_REGSET_MASK		0xf0000000

- k=

^ permalink raw reply

* Re: [PATCH 2/4] powerpc/booke: Merge the 32 bit e5500/e500mc cpu setup code.
From: Kumar Gala @ 2012-08-06 15:58 UTC (permalink / raw)
  To: Sethi Varun-B16395
  Cc: linuxppc-dev@lists.ozlabs.org, agraf@suse.de,
	kvm-ppc@vger.kernel.org
In-Reply-To: <C5ECD7A89D1DC44195F34B25E172658D15ADB3@039-SN2MPN1-013.039d.mgd.msft.net>


On Aug 4, 2012, at 1:31 PM, Sethi Varun-B16395 wrote:

>=20
>=20
>> -----Original Message-----
>> From: Kumar Gala [mailto:galak@kernel.crashing.org]
>> Sent: Friday, August 03, 2012 10:04 PM
>> To: Sethi Varun-B16395
>> Cc: agraf@suse.de; benh@kernel.crashing.org; linuxppc-
>> dev@lists.ozlabs.org; kvm-ppc@vger.kernel.org
>> Subject: Re: [PATCH 2/4] powerpc/booke: Merge the 32 bit e5500/e500mc =
cpu
>> setup code.
>>=20
>>=20
>> On Jul 9, 2012, at 7:58 AM, Varun Sethi wrote:
>>=20
>>> Merge the 32 bit cpu setup code for e500mc/e5500 and define the
>> "cpu_restore"
>>> routine (for e5500/e6500) only for the 64 bit case. The cpu_restore
>>> routine is used in the 64 bit case for setting up the secondary =
cores.
>>>=20
>>> Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
>>> ---
>>> arch/powerpc/kernel/cpu_setup_fsl_booke.S |    1 +
>>> arch/powerpc/kernel/cputable.c            |    4 ++++
>>> 2 files changed, 5 insertions(+), 0 deletions(-)
>>>=20
>>> diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
>>> b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
>>> index a55d028..5e87737 100644
>>> --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
>>> +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
>>> @@ -75,6 +75,7 @@ _GLOBAL(__setup_cpu_e500v2)
>>> 	mtlr	r4
>>> 	blr
>>> _GLOBAL(__setup_cpu_e500mc)
>>> +_GLOBAL(__setup_cpu_e5500)
>>=20
>> This is a bit confusing, as we now have duplicated =
__setup_cpu_e5500()
>> between the ppc32 and ppc64 cases.
>>=20
>> If you build this patch for corenet32_smp_defconfig it fails.
> [Sethi Varun-B16395] I am able to build without any issue with the =
same config.
>=20
> -Varun

If you build corenet32_smp_defconfig at commit:

commit c5537ef2d672d2cf48d4e4ac754781c8db112843
Author: Varun Sethi <Varun.Sethi@freescale.com>
Date:   Mon Jul 9 18:28:21 2012 +0530

    powerpc/booke: Merge the 32 bit e5500/e500mc cpu setup code.
   =20
You get the following build error:

arch/powerpc/kernel/cpu_setup_fsl_booke.S: Assembler messages:
arch/powerpc/kernel/cpu_setup_fsl_booke.S:110: Error: symbol =
`__setup_cpu_e5500' is already defined

- k=

^ permalink raw reply

* Re: [PATCH 1/1] booke/wdt: fix incorrect WDIOC_GETSUPPORT return path
From: Kumar Gala @ 2012-08-06 16:13 UTC (permalink / raw)
  To: tiejun.chen, Timur Tabi; +Cc: linux-watchdog, linuxppc-dev@ozlabs.org list
In-Reply-To: <501F35C2.6090005@windriver.com>


On Aug 5, 2012, at 10:10 PM, tiejun.chen wrote:

> On 07/30/2012 04:15 PM, Tiejun Chen wrote:
>> We miss that correct WDIOC_GETSUPPORT return path when perform
>> copy_to_user() properly.
> 
> Any comments?
> 
> Thanks
> Tiejun

Adding Timur, as he's touched watchdog last.

- k

>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> ---
>> drivers/watchdog/booke_wdt.c |    7 ++++---
>> 1 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
>> index 3fe82d0..2be7f29 100644
>> --- a/drivers/watchdog/booke_wdt.c
>> +++ b/drivers/watchdog/booke_wdt.c
>> @@ -162,12 +162,13 @@ static long booke_wdt_ioctl(struct file *file,
>> 				unsigned int cmd, unsigned long arg)
>> {
>> 	u32 tmp = 0;
>> -	u32 __user *p = (u32 __user *)arg;
>> +	void __user *argp = (u32 __user *)arg;
>> +	u32 __user *p = argp;
>> 
>> 	switch (cmd) {
>> 	case WDIOC_GETSUPPORT:
>> -		if (copy_to_user((void *)arg, &ident, sizeof(ident)))
>> -			return -EFAULT;
>> +		return copy_to_user(argp, &ident,
>> +				sizeof(ident)) ? -EFAULT : 0;
>> 	case WDIOC_GETSTATUS:
>> 		return put_user(0, p);
>> 	case WDIOC_GETBOOTSTATUS:
>> 

^ permalink raw reply

* RE: [PATCH 3/3 v4] powerpc/mpic: FSL MPIC error interrupt support.
From: Sethi Varun-B16395 @ 2012-08-06 16:22 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev@lists.ozlabs.org, Hamciuc Bogdan-BHAMCIU1
In-Reply-To: <FAD36277-EA2D-49D7-BD98-2F132CA886AA@kernel.crashing.org>



> -----Original Message-----
> From: Kumar Gala [mailto:galak@kernel.crashing.org]
> Sent: Monday, August 06, 2012 9:23 PM
> To: Sethi Varun-B16395
> Cc: linuxppc-dev@lists.ozlabs.org; Hamciuc Bogdan-BHAMCIU1
> Subject: Re: [PATCH 3/3 v4] powerpc/mpic: FSL MPIC error interrupt
> support.
>=20
>=20
> On Aug 6, 2012, at 7:44 AM, Varun Sethi wrote:
>=20
> > All SOC device error interrupts are muxed and delivered to the core as
> > a single MPIC error interrupt. Currently all the device drivers
> > requiring access to device errors have to register for the MPIC error
> > interrupt as a shared interrupt.
> >
> > With this patch we add interrupt demuxing capability in the mpic
> > driver, allowing device drivers to register for their individual error
> interrupts.
> > This is achieved by handling error interrupts in a cascaded fashion.
> >
> > MPIC error interrupt is handled by the "error_int_handler", which
> > subsequently demuxes it using the EISR and delivers it to the
> > respective drivers.
> >
> > The error interrupt capability is dependent on the MPIC EIMR register,
> > which was introduced in FSL MPIC version 4.1 (P4080 rev2). So, error
> > interrupt demuxing capability is dependent on the MPIC version and can
> > be used for versions >=3D 4.1.
> >
> > Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
> > Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> [In the
> > initial version of the patch we were using handle_simple_irq as the
> > handler for cascaded error interrupts, this resulted in issues in case
> > of threaded isrs (with RT kernel). This issue was debugged by Bogdan
> > and decision was taken to use the handle_level_irq handler]
> > ---
> > arch/powerpc/include/asm/mpic.h    |   16 ++++
> > arch/powerpc/sysdev/Makefile       |    2 +-
> > arch/powerpc/sysdev/fsl_mpic_err.c |  153
> ++++++++++++++++++++++++++++++++++++
> > arch/powerpc/sysdev/mpic.c         |   45 ++++++++++-
> > arch/powerpc/sysdev/mpic.h         |   22 +++++
> > 5 files changed, 236 insertions(+), 2 deletions(-) create mode 100644
> > arch/powerpc/sysdev/fsl_mpic_err.c
> >
> > diff --git a/arch/powerpc/include/asm/mpic.h
> > b/arch/powerpc/include/asm/mpic.h index e14d35d..6c8e53b 100644
> > --- a/arch/powerpc/include/asm/mpic.h
> > +++ b/arch/powerpc/include/asm/mpic.h
> > @@ -118,6 +118,9 @@
> > #define MPIC_MAX_CPUS		32
> > #define MPIC_MAX_ISU		32
> >
> > +#define MPIC_MAX_ERR      32
> > +#define MPIC_FSL_ERR_INT  16
> > +
> > /*
> >  * Tsi108 implementation of MPIC has many differences from the
> > original one  */ @@ -270,6 +273,7 @@ struct mpic
> > 	struct irq_chip		hc_ipi;
> > #endif
> > 	struct irq_chip		hc_tm;
> > +	struct irq_chip		hc_err;
> > 	const char		*name;
> > 	/* Flags */
> > 	unsigned int		flags;
> > @@ -283,6 +287,8 @@ struct mpic
> > 	/* vector numbers used for internal sources (ipi/timers) */
> > 	unsigned int		ipi_vecs[4];
> > 	unsigned int		timer_vecs[8];
> > +	/* vector numbers used for FSL MPIC error interrupts */
> > +	unsigned int		err_int_vecs[MPIC_MAX_ERR];
> >
> > 	/* Spurious vector to program into unused sources */
> > 	unsigned int		spurious_vec;
> > @@ -306,6 +312,11 @@ struct mpic
> > 	struct mpic_reg_bank	cpuregs[MPIC_MAX_CPUS];
> > 	struct mpic_reg_bank	isus[MPIC_MAX_ISU];
> >
> > +	/* ioremap'ed base for error interrupt registers */
> > +	u32 __iomem	*err_regs;
> > +	/* error interrupt config */
> > +	u32			err_int_config_done;
>=20
> I thought we were going to remove this as it don't really provide any
> value.
>=20
[Sethi Varun-B16395] We need a way to determine that irq handle got registe=
red for=20
Mpic error interrupt, only then can we go ahead and assign individual (casc=
aded)
error interrupts. Initially we were doing the same thing while translating=
=20
error interrupt specifier, now we are registering the handler in mpic_init.

-Varun=20

^ permalink raw reply

* RE: [PATCH 2/4] powerpc/booke: Merge the 32 bit e5500/e500mc cpu setup code.
From: Sethi Varun-B16395 @ 2012-08-06 16:24 UTC (permalink / raw)
  To: Kumar Gala
  Cc: linuxppc-dev@lists.ozlabs.org, agraf@suse.de,
	kvm-ppc@vger.kernel.org
In-Reply-To: <D762F063-8210-41EF-B583-F277967EA279@kernel.crashing.org>



> -----Original Message-----
> From: Kumar Gala [mailto:galak@kernel.crashing.org]
> Sent: Monday, August 06, 2012 9:28 PM
> To: Sethi Varun-B16395
> Cc: agraf@suse.de; benh@kernel.crashing.org; linuxppc-
> dev@lists.ozlabs.org; kvm-ppc@vger.kernel.org
> Subject: Re: [PATCH 2/4] powerpc/booke: Merge the 32 bit e5500/e500mc cpu
> setup code.
>=20
>=20
> On Aug 4, 2012, at 1:31 PM, Sethi Varun-B16395 wrote:
>=20
> >
> >
> >> -----Original Message-----
> >> From: Kumar Gala [mailto:galak@kernel.crashing.org]
> >> Sent: Friday, August 03, 2012 10:04 PM
> >> To: Sethi Varun-B16395
> >> Cc: agraf@suse.de; benh@kernel.crashing.org; linuxppc-
> >> dev@lists.ozlabs.org; kvm-ppc@vger.kernel.org
> >> Subject: Re: [PATCH 2/4] powerpc/booke: Merge the 32 bit e5500/e500mc
> >> cpu setup code.
> >>
> >>
> >> On Jul 9, 2012, at 7:58 AM, Varun Sethi wrote:
> >>
> >>> Merge the 32 bit cpu setup code for e500mc/e5500 and define the
> >> "cpu_restore"
> >>> routine (for e5500/e6500) only for the 64 bit case. The cpu_restore
> >>> routine is used in the 64 bit case for setting up the secondary
> cores.
> >>>
> >>> Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
> >>> ---
> >>> arch/powerpc/kernel/cpu_setup_fsl_booke.S |    1 +
> >>> arch/powerpc/kernel/cputable.c            |    4 ++++
> >>> 2 files changed, 5 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> >>> b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> >>> index a55d028..5e87737 100644
> >>> --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> >>> +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
> >>> @@ -75,6 +75,7 @@ _GLOBAL(__setup_cpu_e500v2)
> >>> 	mtlr	r4
> >>> 	blr
> >>> _GLOBAL(__setup_cpu_e500mc)
> >>> +_GLOBAL(__setup_cpu_e5500)
> >>
> >> This is a bit confusing, as we now have duplicated
> >> __setup_cpu_e5500() between the ppc32 and ppc64 cases.
> >>
> >> If you build this patch for corenet32_smp_defconfig it fails.
> > [Sethi Varun-B16395] I am able to build without any issue with the same
> config.
> >
> > -Varun
>=20
> If you build corenet32_smp_defconfig at commit:
>=20
> commit c5537ef2d672d2cf48d4e4ac754781c8db112843
> Author: Varun Sethi <Varun.Sethi@freescale.com>
> Date:   Mon Jul 9 18:28:21 2012 +0530
>=20
>     powerpc/booke: Merge the 32 bit e5500/e500mc cpu setup code.
>=20
> You get the following build error:
>=20
> arch/powerpc/kernel/cpu_setup_fsl_booke.S: Assembler messages:
> arch/powerpc/kernel/cpu_setup_fsl_booke.S:110: Error: symbol
> `__setup_cpu_e5500' is already defined
>=20
Oh.., didn't realize that. Thanks for fixing this.

-Varun

^ permalink raw reply

* Re: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx
From: Ira W. Snyder @ 2012-08-06 17:51 UTC (permalink / raw)
  To: qiang.liu
  Cc: arnd, vinod.koul, gregkh, linux-kernel, dan.j.williams, herbert,
	linux-crypto, dan.j.williams, linuxppc-dev, davem
In-Reply-To: <1344248073-9276-1-git-send-email-qiang.liu@freescale.com>

On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang.liu@freescale.com wrote:
> From: Qiang Liu <qiang.liu@freescale.com>
> 
> Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> Async_tx is lack of support in current release process of dma descriptor,
> all descriptors will be released whatever is acked or no-acked by async_tx,
> so there is a potential race condition when dma engine is uesd by others
> clients (e.g. when enable NET_DMA to offload TCP).
> 
> In our case, a race condition which is raised when use both of talitos
> and dmaengine to offload xor is because napi scheduler will sync all
> pending requests in dma channels, it affects the process of raid operations
> due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
> which is submitted just now, as a dependent tx, this freed descriptor trigger
> BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> 
> TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf40000 CPU: 0
> GPR00: 00000001 ecf41ca0 ee44/921a94a0 0000003f 00000001 c00593e4 00000000 00000001
> GPR08: 00000000 a7a7a7a7 00000001 045/920000002 42028042 100a38d4 ed576d98 00000000
> GPR16: ed5a11b0 00000000 2b162000 00000200 046/920000000 2d555000 ed3015e8 c15a7aa0
> GPR24: 00000000 c155fc40 00000000 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0
> NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> LR [c02b068c] async_tx_submit+0x26c/0x2b4
> Call Trace:
> [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> [ecf41f90] [c008277c] kthread+0x8c/0x90
> [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> 
> Another modification in this patch is the change of completed descriptors,
> there is a potential risk which caused by exception interrupt, all descriptors
> in ld_running list are seemed completed when an interrupt raised, it works fine
> under normal condition, but if there is an exception occured, it cannot work
> as our excepted. Hardware should not be depend on s/w list, the right way is
> to read current descriptor address register to find the last completed
> descriptor. If an interrupt is raised by an error, all descriptors in ld_running
> should not be seemed finished, or these unfinished descriptors in ld_running
> will be released wrongly.
> 
> A simple way to reproduce,
> Enable dmatest first, then insert some bad descriptors which can trigger
> Programming Error interrupts before the good descriptors. Last, the good
> descriptors will be freed before they are processsed because of the exception
> intrerrupt.
> 
> Note: the bad descriptors are only for simulating an exception interrupt.
> This case can illustrate the potential risk in current fsl-dma very well.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Dan Williams <dan.j.williams@gmail.com>
> Cc: Vinod Koul <vinod.koul@intel.com>
> Cc: Li Yang <leoli@freescale.com>
> Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>

There are two minor nitpicks below. Other than that, the patch looks
excellent to me.

Ira

> ---
>  drivers/dma/fsldma.c |  232 ++++++++++++++++++++++++++++++++++----------------
>  drivers/dma/fsldma.h |   17 +++-
>  2 files changed, 174 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 36490a3..938d8c1 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -472,6 +472,110 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
>  }
> 
>  /**
> + * fsldma_clean_completed_descriptor - free all descriptors which
> + * has been completed and acked
> + * @chan: Freescale DMA channel
> + *
> + * This function is used on all completed and acked descriptors.
> + * All descriptors should only be freed in this function.
> + */
> +static void
> +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
> +{
> +	struct fsl_desc_sw *desc, *_desc;
> +
> +	/* Run the callback for each descriptor, in order */
> +	list_for_each_entry_safe(desc, _desc, &chan->ld_completed, node)
> +		if (async_tx_test_ack(&desc->async_tx))
> +			fsl_dma_free_descriptor(chan, desc);
> +}
> +
> +/**
> + * fsldma_run_tx_complete_actions - cleanup a single link descriptor
> + * @chan: Freescale DMA channel
> + * @desc: descriptor to cleanup and free
> + * @cookie: Freescale DMA transaction identifier
> + *
> + * This function is used on a descriptor which has been executed by the DMA
> + * controller. It will run any callbacks, submit any dependencies.
> + */
> +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsldma_chan *chan,
> +		struct fsl_desc_sw *desc, dma_cookie_t cookie)
> +{
> +	struct dma_async_tx_descriptor *txd = &desc->async_tx;
> +	struct device *dev = chan->common.device->dev;
> +	dma_addr_t src = get_desc_src(chan, desc);
> +	dma_addr_t dst = get_desc_dst(chan, desc);
> +	u32 len = get_desc_cnt(chan, desc);
> +
> +	BUG_ON(txd->cookie < 0);
> +
> +	if (txd->cookie > 0) {
> +		cookie = txd->cookie;
> +
> +		/* Run the link descriptor callback function */
> +		if (txd->callback) {
> +#ifdef FSL_DMA_LD_DEBUG
> +			chan_dbg(chan, "LD %p callback\n", desc);
> +#endif
> +			txd->callback(txd->callback_param);
> +		}
> +
> +		/* Unmap the dst buffer, if requested */
> +		if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
> +			if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
> +				dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
> +			else
> +				dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
> +		}
> +
> +		/* Unmap the src buffer, if requested */
> +		if (!(txd->flags & DMA_COMPL_SKIP_SRC_UNMAP)) {
> +			if (txd->flags & DMA_COMPL_SRC_UNMAP_SINGLE)
> +				dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
> +			else
> +				dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
> +		}
> +	}
> +
> +	/* Run any dependencies */
> +	dma_run_dependencies(txd);
> +
> +	return cookie;
> +}
> +
> +/**
> + * fsldma_clean_running_descriptor - move the completed descriptor from
> + * ld_running to ld_completed
> + * @chan: Freescale DMA channel
> + * @desc: the descriptor which is completed
> + *
> + * Free the descriptor directly if acked by async_tx api, or move it to
> + * queue ld_completed.
> + */
> +static void
> +fsldma_clean_running_descriptor(struct fsldma_chan *chan,
> +		struct fsl_desc_sw *desc)
> +{
> +	/* Remove from the list of transactions */
> +	list_del(&desc->node);

Minor nitpick. Add a blank line here.

> +	/*
> +	 * the client is allowed to attach dependent operations
> +	 * until 'ack' is set
> +	 */
> +	if (!async_tx_test_ack(&desc->async_tx)) {
> +		/*
> +		 * Move this descriptor to the list of descriptors which is
> +		 * completed, but still awaiting the 'ack' bit to be set.
> +		 */
> +		list_add_tail(&desc->node, &chan->ld_completed);
> +		return;
> +	}
> +
> +	dma_pool_free(chan->desc_pool, desc, desc->async_tx.phys);
> +}
> +
> +/**
>   * fsl_chan_xfer_ld_queue - transfer any pending transactions
>   * @chan : Freescale DMA channel
>   *
> @@ -539,51 +643,58 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
>  }
> 
>  /**
> - * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
> + * fsldma_cleanup_descriptors - cleanup link descriptors which are completed
> + * and move them to ld_completed to free until flag 'ack' is set
>   * @chan: Freescale DMA channel
> - * @desc: descriptor to cleanup and free
>   *
> - * This function is used on a descriptor which has been executed by the DMA
> - * controller. It will run any callbacks, submit any dependencies, and then
> - * free the descriptor.
> + * This function is used on descriptors which have been executed by the DMA
> + * controller. It will run any callbacks, submit any dependencies, then
> + * free these descriptors if flag 'ack' is set.
>   */
> -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
> -				      struct fsl_desc_sw *desc)
> +static void fsldma_cleanup_descriptors(struct fsldma_chan *chan)
>  {
> -	struct dma_async_tx_descriptor *txd = &desc->async_tx;
> -	struct device *dev = chan->common.device->dev;
> -	dma_addr_t src = get_desc_src(chan, desc);
> -	dma_addr_t dst = get_desc_dst(chan, desc);
> -	u32 len = get_desc_cnt(chan, desc);
> +	struct fsl_desc_sw *desc, *_desc;
> +	dma_cookie_t cookie = 0;
> +	dma_addr_t curr_phys = get_cdar(chan);
> +	int seen_current = 0;
> 
> -	/* Run the link descriptor callback function */
> -	if (txd->callback) {
> -#ifdef FSL_DMA_LD_DEBUG
> -		chan_dbg(chan, "LD %p callback\n", desc);
> -#endif
> -		txd->callback(txd->callback_param);
> -	}
> +	fsldma_clean_completed_descriptor(chan);
> 
> -	/* Run any dependencies */
> -	dma_run_dependencies(txd);
> +	/* Run the callback for each descriptor, in order */
> +	list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> +		/*
> +		 * do not advance past the current descriptor loaded into the
> +		 * hardware channel, subsequent descriptors are either in
> +		 * process or have not been submitted
> +		 */
> +		if (seen_current)
> +			break;
> 
> -	/* Unmap the dst buffer, if requested */
> -	if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
> -		if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
> -			dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
> -		else
> -			dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
> -	}
> +		/*
> +		 * stop the search if we reach the current descriptor and the
> +		 * channel is busy
> +		 */
> +		if (desc->async_tx.phys == curr_phys) {
> +			seen_current = 1;
> +			if (!dma_is_idle(chan))
> +				break;
> +		}

I wonder if this is better:

if (desc->async_tx.phys == get_cdar(chan)) {
	seen_current = 1;
	if (!dma_is_idle(chan))
		break;
}

Your version works fine, it just might stop earlier than necessary. This
is just a nitpick.

> 
> -	/* Unmap the src buffer, if requested */
> -	if (!(txd->flags & DMA_COMPL_SKIP_SRC_UNMAP)) {
> -		if (txd->flags & DMA_COMPL_SRC_UNMAP_SINGLE)
> -			dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
> -		else
> -			dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
> +		cookie = fsldma_run_tx_complete_actions(chan, desc, cookie);
> +
> +		fsldma_clean_running_descriptor(chan, desc);
>  	}
> 
> -	fsl_dma_free_descriptor(chan, desc);
> +	/*
> +	 * Start any pending transactions automatically
> +	 *
> +	 * In the ideal case, we keep the DMA controller busy while we go
> +	 * ahead and free the descriptors below.
> +	 */
> +	fsl_chan_xfer_ld_queue(chan);
> +
> +	if (cookie > 0)
> +		chan->common.completed_cookie = cookie;
>  }
> 
>  /**
> @@ -654,8 +765,10 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
> 
>  	chan_dbg(chan, "free all channel resources\n");
>  	spin_lock_irqsave(&chan->desc_lock, flags);
> +	fsldma_cleanup_descriptors(chan);
>  	fsldma_free_desc_list(chan, &chan->ld_pending);
>  	fsldma_free_desc_list(chan, &chan->ld_running);
> +	fsldma_free_desc_list(chan, &chan->ld_completed);
>  	spin_unlock_irqrestore(&chan->desc_lock, flags);
> 
>  	dma_pool_destroy(chan->desc_pool);
> @@ -893,6 +1006,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
>  		/* Remove and free all of the descriptors in the LD queue */
>  		fsldma_free_desc_list(chan, &chan->ld_pending);
>  		fsldma_free_desc_list(chan, &chan->ld_running);
> +		fsldma_free_desc_list(chan, &chan->ld_completed);
>  		chan->idle = true;
> 
>  		spin_unlock_irqrestore(&chan->desc_lock, flags);
> @@ -956,11 +1070,15 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan,
>  	enum dma_status ret;
>  	unsigned long flags;
> 
> -	spin_lock_irqsave(&chan->desc_lock, flags);
>  	ret = dma_cookie_status(dchan, cookie, txstate);
> +	if (ret == DMA_SUCCESS)
> +		return ret;
> +
> +	spin_lock_irqsave(&chan->desc_lock, flags);
> +	fsldma_cleanup_descriptors(chan);
>  	spin_unlock_irqrestore(&chan->desc_lock, flags);
> 
> -	return ret;
> +	return dma_cookie_status(dchan, cookie, txstate);
>  }
> 
>  /*----------------------------------------------------------------------------*/
> @@ -1037,52 +1155,19 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
>  static void dma_do_tasklet(unsigned long data)
>  {
>  	struct fsldma_chan *chan = (struct fsldma_chan *)data;
> -	struct fsl_desc_sw *desc, *_desc;
> -	LIST_HEAD(ld_cleanup);
>  	unsigned long flags;
> 
>  	chan_dbg(chan, "tasklet entry\n");
> 
>  	spin_lock_irqsave(&chan->desc_lock, flags);
> 
> -	/* update the cookie if we have some descriptors to cleanup */
> -	if (!list_empty(&chan->ld_running)) {
> -		dma_cookie_t cookie;
> -
> -		desc = to_fsl_desc(chan->ld_running.prev);
> -		cookie = desc->async_tx.cookie;
> -		dma_cookie_complete(&desc->async_tx);
> -
> -		chan_dbg(chan, "completed_cookie=%d\n", cookie);
> -	}
> -
> -	/*
> -	 * move the descriptors to a temporary list so we can drop the lock
> -	 * during the entire cleanup operation
> -	 */
> -	list_splice_tail_init(&chan->ld_running, &ld_cleanup);
> -
>  	/* the hardware is now idle and ready for more */
>  	chan->idle = true;
> 
> -	/*
> -	 * Start any pending transactions automatically
> -	 *
> -	 * In the ideal case, we keep the DMA controller busy while we go
> -	 * ahead and free the descriptors below.
> -	 */
> -	fsl_chan_xfer_ld_queue(chan);
> -	spin_unlock_irqrestore(&chan->desc_lock, flags);
> -
> -	/* Run the callback for each descriptor, in order */
> -	list_for_each_entry_safe(desc, _desc, &ld_cleanup, node) {
> +	/* Run all cleanup for descriptors which have been completed */
> +	fsldma_cleanup_descriptors(chan);
> 
> -		/* Remove from the list of transactions */
> -		list_del(&desc->node);
> -
> -		/* Run all cleanup for this descriptor */
> -		fsldma_cleanup_descriptor(chan, desc);
> -	}
> +	spin_unlock_irqrestore(&chan->desc_lock, flags);
> 
>  	chan_dbg(chan, "tasklet exit\n");
>  }
> @@ -1264,6 +1349,7 @@ static int __devinit fsl_dma_chan_probe(struct fsldma_device *fdev,
>  	spin_lock_init(&chan->desc_lock);
>  	INIT_LIST_HEAD(&chan->ld_pending);
>  	INIT_LIST_HEAD(&chan->ld_running);
> +	INIT_LIST_HEAD(&chan->ld_completed);
>  	chan->idle = true;
> 
>  	chan->common.device = &fdev->common;
> diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
> index f5c3879..a58275a 100644
> --- a/drivers/dma/fsldma.h
> +++ b/drivers/dma/fsldma.h
> @@ -138,8 +138,21 @@ struct fsldma_chan {
>  	char name[8];			/* Channel name */
>  	struct fsldma_chan_regs __iomem *regs;
>  	spinlock_t desc_lock;		/* Descriptor operation lock */
> -	struct list_head ld_pending;	/* Link descriptors queue */
> -	struct list_head ld_running;	/* Link descriptors queue */
> +	/*
> +	 * Descriptors which are queued to run, but have not yet been
> +	 * submitted to the hardware for execution
> +	 */
> +	struct list_head ld_pending;
> +	/*
> +	 * Descriptors which are currently being executed by the hardware
> +	 */
> +	struct list_head ld_running;
> +	/*
> +	 * Descriptors which have finished execution by the hardware. These
> +	 * descriptors have already had their cleanup actions run. They are
> +	 * waiting for the ACK bit to be set by the async_tx API.
> +	 */
> +	struct list_head ld_completed;	/* Link descriptors queue */
>  	struct dma_chan common;		/* DMA common channel */
>  	struct dma_pool *desc_pool;	/* Descriptors pool */
>  	struct device *dev;		/* Channel device */
> --
> 1.7.5.1
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH 1/1] booke/wdt: fix incorrect WDIOC_GETSUPPORT return path
From: Tabi Timur-B04825 @ 2012-08-06 23:27 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linux-watchdog@vger.kernel.org, linuxppc-dev@ozlabs.org
In-Reply-To: <1343636122-23273-1-git-send-email-tiejun.chen@windriver.com>

On Mon, Jul 30, 2012 at 3:15 AM, Tiejun Chen <tiejun.chen@windriver.com> wr=
ote:
> We miss that correct WDIOC_GETSUPPORT return path when perform
> copy_to_user() properly.

Thanks for catching this.  I'm amazed that this driver still has bugs like =
this.

> diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
> index 3fe82d0..2be7f29 100644
> --- a/drivers/watchdog/booke_wdt.c
> +++ b/drivers/watchdog/booke_wdt.c
> @@ -162,12 +162,13 @@ static long booke_wdt_ioctl(struct file *file,
>                                 unsigned int cmd, unsigned long arg)
>  {
>         u32 tmp =3D 0;
> -       u32 __user *p =3D (u32 __user *)arg;
> +       void __user *argp =3D (u32 __user *)arg;
> +       u32 __user *p =3D argp;

You don't need to create 'argp'.  The existing 'p' variable will work
in the copy_to_user() call.

> +               return copy_to_user(argp, &ident,
> +                               sizeof(ident)) ? -EFAULT : 0;

This can fit in one line, especially if you use 'p' instead of 'argp'.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* Re: [PATCH 1/1] booke/wdt: fix incorrect WDIOC_GETSUPPORT return path
From: Tabi Timur-B04825 @ 2012-08-06 23:36 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linux-watchdog@vger.kernel.org, linuxppc-dev@ozlabs.org
In-Reply-To: <CAOZdJXWftkUSw57-Qbe9JFP9NwodkHprKr6_1xi6ZzzLN8A4wA@mail.gmail.com>

On Mon, Aug 6, 2012 at 2:12 PM, Tabi Timur-B04825 <b04825@freescale.com> wr=
ote:
> On Mon, Jul 30, 2012 at 3:15 AM, Tiejun Chen <tiejun.chen@windriver.com> =
wrote:
>> We miss that correct WDIOC_GETSUPPORT return path when perform
>> copy_to_user() properly.
>
> Thanks for catching this.  I'm amazed that this driver still has bugs lik=
e this.

While you're at it, I found a few related bugs.  Can you fix these, also?

1.	case WDIOC_SETOPTIONS:
		if (get_user(tmp, p))
			return -EINVAL;

This should return -EFAULT.

2. 	case WDIOC_GETBOOTSTATUS:
		/* XXX: something is clearing TSR */
		tmp =3D mfspr(SPRN_TSR) & TSR_WRS(3);
		/* returns CARDRESET if last reset was caused by the WDT */
		return (tmp ? WDIOF_CARDRESET : 0);

This should use put_user() to return the value, instead of returning
it as a return code.

You can title the new patch something like, "booke/wdt: some ioctls do
not return values properly"

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* Re: [PATCH v6 0/8] Raid: enable talitos xor offload for improving performance
From: Kim Phillips @ 2012-08-07  1:35 UTC (permalink / raw)
  To: qiang.liu
  Cc: arnd, vinod.koul, gregkh, linux-kernel, dan.j.williams, herbert,
	linux-crypto, dan.j.williams, linuxppc-dev
In-Reply-To: <1344247815-1104-1-git-send-email-qiang.liu@freescale.com>

On Mon, 6 Aug 2012 18:10:15 +0800
<qiang.liu@freescale.com> wrote:

> Changes in v6:
> 	- swap the order of original patch 3/6 and 4/6;
> 	- merge Ira's patch to reduce the size of original patch;
> 	- merge Ira's patch of carma in 8/8;
> 	- update documents and descriptions according to Ira's advice;

fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
[1], and this patchseries, along with FSL_DMA && NET_DMA set seems
to be holding water, so this series gets my:

Tested-by: Kim Phillips <kim.phillips@freescale.com>

Thanks,

Kim

[1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 /dev/ram[0123]

^ permalink raw reply

* Re: [PATCH 1/1] booke/wdt: fix incorrect WDIOC_GETSUPPORT return path
From: tiejun.chen @ 2012-08-07  1:56 UTC (permalink / raw)
  To: Tabi Timur-B04825; +Cc: linux-watchdog@vger.kernel.org, linuxppc-dev@ozlabs.org
In-Reply-To: <CAOZdJXWYyc_=iikFwSLy4gX7Grag=jxvWhYrP4Y9U3QZcwieBw@mail.gmail.com>

On 08/07/2012 09:19 AM, Tabi Timur-B04825 wrote:
> On Mon, Aug 6, 2012 at 2:12 PM, Tabi Timur-B04825 <b04825@freescale.com> wrote:
>> On Mon, Jul 30, 2012 at 3:15 AM, Tiejun Chen <tiejun.chen@windriver.com> wrote:
>>> We miss that correct WDIOC_GETSUPPORT return path when perform
>>> copy_to_user() properly.
>>
>> Thanks for catching this.  I'm amazed that this driver still has bugs like this.
> 
> While you're at it, I found a few related bugs.  Can you fix these, also?
> 
> 1.	case WDIOC_SETOPTIONS:
> 		if (get_user(tmp, p))
> 			return -EINVAL;
> 
> This should return -EFAULT.
> 
> 2. 	case WDIOC_GETBOOTSTATUS:
> 		/* XXX: something is clearing TSR */
> 		tmp = mfspr(SPRN_TSR) & TSR_WRS(3);
> 		/* returns CARDRESET if last reset was caused by the WDT */
> 		return (tmp ? WDIOF_CARDRESET : 0);
> 
> This should use put_user() to return the value, instead of returning
> it as a return code.
> 
> You can title the new patch something like, "booke/wdt: some ioctls do
> not return values properly"

Will regenerate this patch including these error as v2.

Thanks
Tiejun

^ permalink raw reply

* [v2 PATCH 1/1] booke/wdt: some ioctls do not return values properly
From: Tiejun Chen @ 2012-08-07  1:59 UTC (permalink / raw)
  To: B04825; +Cc: linuxppc-dev, linux-watchdog

Fix some booke wdt ioctls return value error.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
 drivers/watchdog/booke_wdt.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index 3fe82d0..5b06d31 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -166,18 +166,17 @@ static long booke_wdt_ioctl(struct file *file,
 
 	switch (cmd) {
 	case WDIOC_GETSUPPORT:
-		if (copy_to_user((void *)arg, &ident, sizeof(ident)))
-			return -EFAULT;
+		return copy_to_user(p, &ident, sizeof(ident)) ? -EFAULT : 0;
 	case WDIOC_GETSTATUS:
 		return put_user(0, p);
 	case WDIOC_GETBOOTSTATUS:
 		/* XXX: something is clearing TSR */
 		tmp = mfspr(SPRN_TSR) & TSR_WRS(3);
 		/* returns CARDRESET if last reset was caused by the WDT */
-		return (tmp ? WDIOF_CARDRESET : 0);
+		return put_user((tmp ? WDIOF_CARDRESET : 0), p);
 	case WDIOC_SETOPTIONS:
 		if (get_user(tmp, p))
-			return -EINVAL;
+			return -EFAULT;
 		if (tmp == WDIOS_ENABLECARD) {
 			booke_wdt_ping();
 			break;
-- 
1.5.6

^ permalink raw reply related

* Re: [v2 PATCH 1/1] booke/wdt: some ioctls do not return values properly
From: Tabi Timur-B04825 @ 2012-08-07  2:20 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev@ozlabs.org, linux-watchdog@vger.kernel.org
In-Reply-To: <1344304780-13555-1-git-send-email-tiejun.chen@windriver.com>

On Mon, Aug 6, 2012 at 8:59 PM, Tiejun Chen <tiejun.chen@windriver.com> wro=
te:
> Fix some booke wdt ioctls return value error.
>
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>

It's not the greatest patch description, but it'll do.

Acked-by: Timur Tabi <timur@freescale.com>

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply

* RE: [PATCH v6 6/8] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
From: Liu Qiang-B32616 @ 2012-08-07  2:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Li Yang-R58472, vinod.koul@intel.com, gregkh@linuxfoundation.org,
	Tabi Timur-B04825, linux-kernel@vger.kernel.org,
	Phillips Kim-R1AAHA, dan.j.williams@gmail.com,
	herbert@gondor.hengli.com.au, linux-crypto@vger.kernel.org,
	dan.j.williams@intel.com, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net
In-Reply-To: <201208061157.17667.arnd@arndb.de>

> -----Original Message-----
> From: Arnd Bergmann [mailto:arnd@arndb.de]
> Sent: Monday, August 06, 2012 7:57 PM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.williams@intel.com; linux-kernel@vger.kernel.org;
> dan.j.williams@gmail.com; vinod.koul@intel.com; Phillips Kim-R1AAHA;
> herbert@gondor.hengli.com.au; davem@davemloft.net;
> gregkh@linuxfoundation.org; Li Yang-R58472; Tabi Timur-B04825
> Subject: Re: [PATCH v6 6/8] fsl-dma: use spin_lock_bh to instead of
> spin_lock_irqsave
>=20
> On Monday 06 August 2012, qiang.liu@freescale.com wrote:
> >
> > From: Qiang Liu <qiang.liu@freescale.com>
> >
> > The use of spin_lock_irqsave() is a stronger locking mechanism than is
> > required throughout the driver. The minimum locking required should be
> > used instead. Interrupts will be turned off and context will be saved,
> > there is needless to use irqsave.
> >
> > Change all instances of spin_lock_irqsave() to spin_lock_bh().
> > All manipulation of protected fields is done using tasklet context or
> > weaker, which makes spin_lock_bh() the correct choice.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Dan Williams <dan.j.williams@gmail.com>
> > Cc: Vinod Koul <vinod.koul@intel.com>
> > Cc: Li Yang <leoli@freescale.com>
> > Cc: Timur Tabi <timur@freescale.com>
> > Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> > Acked-by: Ira W. Snyder <iws@ovro.caltech.edu>
>=20
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>=20
> You could actually change the use of spin_lock_bh inside of the tasklet
> function (dma_do_tasklet) do just spin_lock(), because softirqs are
> already disabled there, but your version is also ok.
Yes, you are right, it will disable softirq.
Thank you very much.

^ permalink raw reply

* RE: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx
From: Liu Qiang-B32616 @ 2012-08-07  3:22 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: arnd@arndb.de, vinod.koul@intel.com, gregkh@linuxfoundation.org,
	linux-kernel@vger.kernel.org, dan.j.williams@gmail.com,
	herbert@gondor.hengli.com.au, linux-crypto@vger.kernel.org,
	dan.j.williams@intel.com, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net
In-Reply-To: <20120806175115.GA23815@ovro.caltech.edu>

> -----Original Message-----
> From: Ira W. Snyder [mailto:iws@ovro.caltech.edu]
> Sent: Tuesday, August 07, 2012 1:51 AM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> dan.j.williams@intel.com; linux-kernel@vger.kernel.org;
> dan.j.williams@gmail.com; vinod.koul@intel.com; arnd@arndb.de;
> gregkh@linuxfoundation.org; herbert@gondor.hengli.com.au;
> davem@davemloft.net
> Subject: Re: [PATCH v6 5/8] fsl-dma: change release process of dma
> descriptor for supporting async_tx
>=20
> On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang.liu@freescale.com wrote:
> > From: Qiang Liu <qiang.liu@freescale.com>
> >
> > Fix the potential risk when enable config NET_DMA and ASYNC_TX.
> > Async_tx is lack of support in current release process of dma
> descriptor,
> > all descriptors will be released whatever is acked or no-acked by
> async_tx,
> > so there is a potential race condition when dma engine is uesd by
> others
> > clients (e.g. when enable NET_DMA to offload TCP).
> >
> > In our case, a race condition which is raised when use both of talitos
> > and dmaengine to offload xor is because napi scheduler will sync all
> > pending requests in dma channels, it affects the process of raid
> operations
> > due to ack_tx is not checked in fsl dma. The no-acked descriptor is
> freed
> > which is submitted just now, as a dependent tx, this freed descriptor
> trigger
> > BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
> >
> > TASK =3D ee1a94a0[1390] 'md0_raid5' THREAD: ecf40000 CPU: 0
> > GPR00: 00000001 ecf41ca0 ee44/921a94a0 0000003f 00000001 c00593e4
> 00000000 00000001
> > GPR08: 00000000 a7a7a7a7 00000001 045/920000002 42028042 100a38d4
> ed576d98 00000000
> > GPR16: ed5a11b0 00000000 2b162000 00000200 046/920000000 2d555000
> ed3015e8 c15a7aa0
> > GPR24: 00000000 c155fc40 00000000 ecb63220 ecf41d28 e47/92f640bb0
> ef640c30 ecf41ca0
> > NIP [c02b048c] async_tx_submit+0x6c/0x2b4
> > LR [c02b068c] async_tx_submit+0x26c/0x2b4
> > Call Trace:
> > [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
> > [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
> > [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
> > [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
> > [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
> > [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
> > [ecf41f40] [c04329b8] md_thread+0x138/0x16c
> > [ecf41f90] [c008277c] kthread+0x8c/0x90
> > [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
> >
> > Another modification in this patch is the change of completed
> descriptors,
> > there is a potential risk which caused by exception interrupt, all
> descriptors
> > in ld_running list are seemed completed when an interrupt raised, it
> works fine
> > under normal condition, but if there is an exception occured, it cannot
> work
> > as our excepted. Hardware should not be depend on s/w list, the right
> way is
> > to read current descriptor address register to find the last completed
> > descriptor. If an interrupt is raised by an error, all descriptors in
> ld_running
> > should not be seemed finished, or these unfinished descriptors in
> ld_running
> > will be released wrongly.
> >
> > A simple way to reproduce,
> > Enable dmatest first, then insert some bad descriptors which can
> trigger
> > Programming Error interrupts before the good descriptors. Last, the
> good
> > descriptors will be freed before they are processsed because of the
> exception
> > intrerrupt.
> >
> > Note: the bad descriptors are only for simulating an exception
> interrupt.
> > This case can illustrate the potential risk in current fsl-dma very
> well.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Dan Williams <dan.j.williams@gmail.com>
> > Cc: Vinod Koul <vinod.koul@intel.com>
> > Cc: Li Yang <leoli@freescale.com>
> > Signed-off-by: Qiang Liu <qiang.liu@freescale.com>
> > Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
>=20
> There are two minor nitpicks below. Other than that, the patch looks
> excellent to me.
>=20
> Ira
>=20
> > ---
> >  drivers/dma/fsldma.c |  232 ++++++++++++++++++++++++++++++++++--------
> --------
> >  drivers/dma/fsldma.h |   17 +++-
> >  2 files changed, 174 insertions(+), 75 deletions(-)
> >
> > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> > index 36490a3..938d8c1 100644
> > --- a/drivers/dma/fsldma.c
> > +++ b/drivers/dma/fsldma.c
> > @@ -472,6 +472,110 @@ static struct fsl_desc_sw
> *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
> >  }
> >
> >  /**
> > + * fsldma_clean_completed_descriptor - free all descriptors which
> > + * has been completed and acked
> > + * @chan: Freescale DMA channel
> > + *
> > + * This function is used on all completed and acked descriptors.
> > + * All descriptors should only be freed in this function.
> > + */
> > +static void
> > +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
> > +{
> > +	struct fsl_desc_sw *desc, *_desc;
> > +
> > +	/* Run the callback for each descriptor, in order */
> > +	list_for_each_entry_safe(desc, _desc, &chan->ld_completed, node)
> > +		if (async_tx_test_ack(&desc->async_tx))
> > +			fsl_dma_free_descriptor(chan, desc);
> > +}
> > +
> > +/**
> > + * fsldma_run_tx_complete_actions - cleanup a single link descriptor
> > + * @chan: Freescale DMA channel
> > + * @desc: descriptor to cleanup and free
> > + * @cookie: Freescale DMA transaction identifier
> > + *
> > + * This function is used on a descriptor which has been executed by
> the DMA
> > + * controller. It will run any callbacks, submit any dependencies.
> > + */
> > +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsldma_chan
> *chan,
> > +		struct fsl_desc_sw *desc, dma_cookie_t cookie)
> > +{
> > +	struct dma_async_tx_descriptor *txd =3D &desc->async_tx;
> > +	struct device *dev =3D chan->common.device->dev;
> > +	dma_addr_t src =3D get_desc_src(chan, desc);
> > +	dma_addr_t dst =3D get_desc_dst(chan, desc);
> > +	u32 len =3D get_desc_cnt(chan, desc);
> > +
> > +	BUG_ON(txd->cookie < 0);
> > +
> > +	if (txd->cookie > 0) {
> > +		cookie =3D txd->cookie;
> > +
> > +		/* Run the link descriptor callback function */
> > +		if (txd->callback) {
> > +#ifdef FSL_DMA_LD_DEBUG
> > +			chan_dbg(chan, "LD %p callback\n", desc);
> > +#endif
> > +			txd->callback(txd->callback_param);
> > +		}
> > +
> > +		/* Unmap the dst buffer, if requested */
> > +		if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
> > +			if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
> > +				dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
> > +			else
> > +				dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
> > +		}
> > +
> > +		/* Unmap the src buffer, if requested */
> > +		if (!(txd->flags & DMA_COMPL_SKIP_SRC_UNMAP)) {
> > +			if (txd->flags & DMA_COMPL_SRC_UNMAP_SINGLE)
> > +				dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
> > +			else
> > +				dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
> > +		}
> > +	}
> > +
> > +	/* Run any dependencies */
> > +	dma_run_dependencies(txd);
> > +
> > +	return cookie;
> > +}
> > +
> > +/**
> > + * fsldma_clean_running_descriptor - move the completed descriptor
> from
> > + * ld_running to ld_completed
> > + * @chan: Freescale DMA channel
> > + * @desc: the descriptor which is completed
> > + *
> > + * Free the descriptor directly if acked by async_tx api, or move it
> to
> > + * queue ld_completed.
> > + */
> > +static void
> > +fsldma_clean_running_descriptor(struct fsldma_chan *chan,
> > +		struct fsl_desc_sw *desc)
> > +{
> > +	/* Remove from the list of transactions */
> > +	list_del(&desc->node);
>=20
> Minor nitpick. Add a blank line here.
My fault. I will correct it.

>=20
> > +	/*
> > +	 * the client is allowed to attach dependent operations
> > +	 * until 'ack' is set
> > +	 */
> > +	if (!async_tx_test_ack(&desc->async_tx)) {
> > +		/*
> > +		 * Move this descriptor to the list of descriptors which is
> > +		 * completed, but still awaiting the 'ack' bit to be set.
> > +		 */
> > +		list_add_tail(&desc->node, &chan->ld_completed);
> > +		return;
> > +	}
> > +
> > +	dma_pool_free(chan->desc_pool, desc, desc->async_tx.phys);
> > +}
> > +
> > +/**
> >   * fsl_chan_xfer_ld_queue - transfer any pending transactions
> >   * @chan : Freescale DMA channel
> >   *
> > @@ -539,51 +643,58 @@ static void fsl_chan_xfer_ld_queue(struct
> fsldma_chan *chan)
> >  }
> >
> >  /**
> > - * fsldma_cleanup_descriptor - cleanup and free a single link
> descriptor
> > + * fsldma_cleanup_descriptors - cleanup link descriptors which are
> completed
> > + * and move them to ld_completed to free until flag 'ack' is set
> >   * @chan: Freescale DMA channel
> > - * @desc: descriptor to cleanup and free
> >   *
> > - * This function is used on a descriptor which has been executed by
> the DMA
> > - * controller. It will run any callbacks, submit any dependencies, and
> then
> > - * free the descriptor.
> > + * This function is used on descriptors which have been executed by
> the DMA
> > + * controller. It will run any callbacks, submit any dependencies,
> then
> > + * free these descriptors if flag 'ack' is set.
> >   */
> > -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
> > -				      struct fsl_desc_sw *desc)
> > +static void fsldma_cleanup_descriptors(struct fsldma_chan *chan)
> >  {
> > -	struct dma_async_tx_descriptor *txd =3D &desc->async_tx;
> > -	struct device *dev =3D chan->common.device->dev;
> > -	dma_addr_t src =3D get_desc_src(chan, desc);
> > -	dma_addr_t dst =3D get_desc_dst(chan, desc);
> > -	u32 len =3D get_desc_cnt(chan, desc);
> > +	struct fsl_desc_sw *desc, *_desc;
> > +	dma_cookie_t cookie =3D 0;
> > +	dma_addr_t curr_phys =3D get_cdar(chan);
> > +	int seen_current =3D 0;
> >
> > -	/* Run the link descriptor callback function */
> > -	if (txd->callback) {
> > -#ifdef FSL_DMA_LD_DEBUG
> > -		chan_dbg(chan, "LD %p callback\n", desc);
> > -#endif
> > -		txd->callback(txd->callback_param);
> > -	}
> > +	fsldma_clean_completed_descriptor(chan);
> >
> > -	/* Run any dependencies */
> > -	dma_run_dependencies(txd);
> > +	/* Run the callback for each descriptor, in order */
> > +	list_for_each_entry_safe(desc, _desc, &chan->ld_running, node) {
> > +		/*
> > +		 * do not advance past the current descriptor loaded into the
> > +		 * hardware channel, subsequent descriptors are either in
> > +		 * process or have not been submitted
> > +		 */
> > +		if (seen_current)
> > +			break;
> >
> > -	/* Unmap the dst buffer, if requested */
> > -	if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
> > -		if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
> > -			dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
> > -		else
> > -			dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
> > -	}
> > +		/*
> > +		 * stop the search if we reach the current descriptor and the
> > +		 * channel is busy
> > +		 */
> > +		if (desc->async_tx.phys =3D=3D curr_phys) {
> > +			seen_current =3D 1;
> > +			if (!dma_is_idle(chan))
> > +				break;
> > +		}
>=20
> I wonder if this is better:
>=20
> if (desc->async_tx.phys =3D=3D get_cdar(chan)) {
> 	seen_current =3D 1;
> 	if (!dma_is_idle(chan))
> 		break;
> }
>=20
> Your version works fine, it just might stop earlier than necessary. This
> is just a nitpick.
2 reasons make me cannot correct it as you expected in v6,
First, there is a conflict when we read current address register, dma deviv=
e will refill current address continually, there is an arbitration mechanis=
m when we read this register, so it may not improve throughput.
Second, as I know, most normal interrupts means the whole list are complete=
d but not for only single descriptor, so there should not be obvious improv=
ement with the latest address. The current address is always same, it point=
s the last descriptor of the list.
Sorry, I should explain it clearly in v5. Thanks.

>=20
> >
> > -	/* Unmap the src buffer, if requested */
> > -	if (!(txd->flags & DMA_COMPL_SKIP_SRC_UNMAP)) {
> > -		if (txd->flags & DMA_COMPL_SRC_UNMAP_SINGLE)
> > -			dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
> > -		else
> > -			dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
> > +		cookie =3D fsldma_run_tx_complete_actions(chan, desc, cookie);
> > +
> > +		fsldma_clean_running_descriptor(chan, desc);
> >  	}
> >
> > -	fsl_dma_free_descriptor(chan, desc);
> > +	/*
> > +	 * Start any pending transactions automatically
> > +	 *
> > +	 * In the ideal case, we keep the DMA controller busy while we go
> > +	 * ahead and free the descriptors below.
> > +	 */
> > +	fsl_chan_xfer_ld_queue(chan);
> > +
> > +	if (cookie > 0)
> > +		chan->common.completed_cookie =3D cookie;
> >  }
> >
> >  /**
> > @@ -654,8 +765,10 @@ static void fsl_dma_free_chan_resources(struct
> dma_chan *dchan)
> >
> >  	chan_dbg(chan, "free all channel resources\n");
> >  	spin_lock_irqsave(&chan->desc_lock, flags);
> > +	fsldma_cleanup_descriptors(chan);
> >  	fsldma_free_desc_list(chan, &chan->ld_pending);
> >  	fsldma_free_desc_list(chan, &chan->ld_running);
> > +	fsldma_free_desc_list(chan, &chan->ld_completed);
> >  	spin_unlock_irqrestore(&chan->desc_lock, flags);
> >
> >  	dma_pool_destroy(chan->desc_pool);
> > @@ -893,6 +1006,7 @@ static int fsl_dma_device_control(struct dma_chan
> *dchan,
> >  		/* Remove and free all of the descriptors in the LD queue */
> >  		fsldma_free_desc_list(chan, &chan->ld_pending);
> >  		fsldma_free_desc_list(chan, &chan->ld_running);
> > +		fsldma_free_desc_list(chan, &chan->ld_completed);
> >  		chan->idle =3D true;
> >
> >  		spin_unlock_irqrestore(&chan->desc_lock, flags);
> > @@ -956,11 +1070,15 @@ static enum dma_status fsl_tx_status(struct
> dma_chan *dchan,
> >  	enum dma_status ret;
> >  	unsigned long flags;
> >
> > -	spin_lock_irqsave(&chan->desc_lock, flags);
> >  	ret =3D dma_cookie_status(dchan, cookie, txstate);
> > +	if (ret =3D=3D DMA_SUCCESS)
> > +		return ret;
> > +
> > +	spin_lock_irqsave(&chan->desc_lock, flags);
> > +	fsldma_cleanup_descriptors(chan);
> >  	spin_unlock_irqrestore(&chan->desc_lock, flags);
> >
> > -	return ret;
> > +	return dma_cookie_status(dchan, cookie, txstate);
> >  }
> >
> >  /*--------------------------------------------------------------------
> --------*/
> > @@ -1037,52 +1155,19 @@ static irqreturn_t fsldma_chan_irq(int irq,
> void *data)
> >  static void dma_do_tasklet(unsigned long data)
> >  {
> >  	struct fsldma_chan *chan =3D (struct fsldma_chan *)data;
> > -	struct fsl_desc_sw *desc, *_desc;
> > -	LIST_HEAD(ld_cleanup);
> >  	unsigned long flags;
> >
> >  	chan_dbg(chan, "tasklet entry\n");
> >
> >  	spin_lock_irqsave(&chan->desc_lock, flags);
> >
> > -	/* update the cookie if we have some descriptors to cleanup */
> > -	if (!list_empty(&chan->ld_running)) {
> > -		dma_cookie_t cookie;
> > -
> > -		desc =3D to_fsl_desc(chan->ld_running.prev);
> > -		cookie =3D desc->async_tx.cookie;
> > -		dma_cookie_complete(&desc->async_tx);
> > -
> > -		chan_dbg(chan, "completed_cookie=3D%d\n", cookie);
> > -	}
> > -
> > -	/*
> > -	 * move the descriptors to a temporary list so we can drop the lock
> > -	 * during the entire cleanup operation
> > -	 */
> > -	list_splice_tail_init(&chan->ld_running, &ld_cleanup);
> > -
> >  	/* the hardware is now idle and ready for more */
> >  	chan->idle =3D true;
> >
> > -	/*
> > -	 * Start any pending transactions automatically
> > -	 *
> > -	 * In the ideal case, we keep the DMA controller busy while we go
> > -	 * ahead and free the descriptors below.
> > -	 */
> > -	fsl_chan_xfer_ld_queue(chan);
> > -	spin_unlock_irqrestore(&chan->desc_lock, flags);
> > -
> > -	/* Run the callback for each descriptor, in order */
> > -	list_for_each_entry_safe(desc, _desc, &ld_cleanup, node) {
> > +	/* Run all cleanup for descriptors which have been completed */
> > +	fsldma_cleanup_descriptors(chan);
> >
> > -		/* Remove from the list of transactions */
> > -		list_del(&desc->node);
> > -
> > -		/* Run all cleanup for this descriptor */
> > -		fsldma_cleanup_descriptor(chan, desc);
> > -	}
> > +	spin_unlock_irqrestore(&chan->desc_lock, flags);
> >
> >  	chan_dbg(chan, "tasklet exit\n");
> >  }
> > @@ -1264,6 +1349,7 @@ static int __devinit fsl_dma_chan_probe(struct
> fsldma_device *fdev,
> >  	spin_lock_init(&chan->desc_lock);
> >  	INIT_LIST_HEAD(&chan->ld_pending);
> >  	INIT_LIST_HEAD(&chan->ld_running);
> > +	INIT_LIST_HEAD(&chan->ld_completed);
> >  	chan->idle =3D true;
> >
> >  	chan->common.device =3D &fdev->common;
> > diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
> > index f5c3879..a58275a 100644
> > --- a/drivers/dma/fsldma.h
> > +++ b/drivers/dma/fsldma.h
> > @@ -138,8 +138,21 @@ struct fsldma_chan {
> >  	char name[8];			/* Channel name */
> >  	struct fsldma_chan_regs __iomem *regs;
> >  	spinlock_t desc_lock;		/* Descriptor operation lock */
> > -	struct list_head ld_pending;	/* Link descriptors queue */
> > -	struct list_head ld_running;	/* Link descriptors queue */
> > +	/*
> > +	 * Descriptors which are queued to run, but have not yet been
> > +	 * submitted to the hardware for execution
> > +	 */
> > +	struct list_head ld_pending;
> > +	/*
> > +	 * Descriptors which are currently being executed by the hardware
> > +	 */
> > +	struct list_head ld_running;
> > +	/*
> > +	 * Descriptors which have finished execution by the hardware. These
> > +	 * descriptors have already had their cleanup actions run. They are
> > +	 * waiting for the ACK bit to be set by the async_tx API.
> > +	 */
> > +	struct list_head ld_completed;	/* Link descriptors queue */
> >  	struct dma_chan common;		/* DMA common channel */
> >  	struct dma_pool *desc_pool;	/* Descriptors pool */
> >  	struct device *dev;		/* Channel device */
> > --
> > 1.7.5.1
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* RE: [PATCH v6 0/8] Raid: enable talitos xor offload for improving performance
From: Liu Qiang-B32616 @ 2012-08-07  3:27 UTC (permalink / raw)
  To: Phillips Kim-R1AAHA
  Cc: Li Yang-R58472, arnd@arndb.de, vinod.koul@intel.com,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	dan.j.williams@gmail.com, herbert@gondor.hengli.com.au,
	linux-crypto@vger.kernel.org, dan.j.williams@intel.com,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20120806203506.bcf31cc63a2d1f55a9695f13@freescale.com>

> -----Original Message-----
> From: Phillips Kim-R1AAHA
> Sent: Tuesday, August 07, 2012 9:35 AM
> To: Liu Qiang-B32616
> Cc: linux-crypto@vger.kernel.org; vinod.koul@intel.com;
> dan.j.williams@intel.com; herbert@gondor.hengli.com.au; arnd@arndb.de;
> gregkh@linuxfoundation.org; linuxppc-dev@lists.ozlabs.org; linux-
> kernel@vger.kernel.org; dan.j.williams@gmail.com; Li Yang-R58472
> Subject: Re: [PATCH v6 0/8] Raid: enable talitos xor offload for
> improving performance
>=20
> On Mon, 6 Aug 2012 18:10:15 +0800
> <qiang.liu@freescale.com> wrote:
>=20
> > Changes in v6:
> > 	- swap the order of original patch 3/6 and 4/6;
> > 	- merge Ira's patch to reduce the size of original patch;
> > 	- merge Ira's patch of carma in 8/8;
> > 	- update documents and descriptions according to Ira's advice;
>=20
> fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks [1],
> and this patchseries, along with FSL_DMA && NET_DMA set seems to be
> holding water, so this series gets my:
>=20
> Tested-by: Kim Phillips <kim.phillips@freescale.com>
Thanks, Kim. I will add this line in v7:)

>=20
> Thanks,
>=20
> Kim
>=20
> [1] mdadm --create --verbose --force /dev/md0 --level=3Draid5 --raid-
> devices=3D4 /dev/ram[0123]

^ permalink raw reply

* RE: [PATCH v2] PCI: use dev->irq instead of dev->pin to enable non MSI/INTx interrupt
From: Zang Roy-R61911 @ 2012-08-07  3:45 UTC (permalink / raw)
  To: Liu Shengzhou-B36685, bhelgaas@google.com,
	linux-pci@vger.kernel.org, akpm@linux-foundation.org, Kumar Gala
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org,
	Liu Shengzhou-B36685
In-Reply-To: <3F453DDFF675A64A89321A1F35281021771A8B@039-SN1MPN1-003.039d.mgd.msft.net>



> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-bounces+tie-
> fei.zang=3Dfreescale.com@lists.ozlabs.org] On Behalf Of Liu Shengzhou-B36=
685
> Sent: Thursday, July 26, 2012 11:45 AM
> To: bhelgaas@google.com; linux-pci@vger.kernel.org; akpm@linux-
> foundation.org
> Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org; Liu Shengzhou-B3668=
5
> Subject: RE: [PATCH v2] PCI: use dev->irq instead of dev->pin to enable n=
on
> MSI/INTx interrupt
>=20
> Hello,
>=20
> A gentle reminder!
> Any comments are appreciated.

Who can help to review and pick up this patch?
Thanks.
Roy

^ permalink raw reply

* Re: [RFC PATCH V6 15/19] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
From: Wen Congyang @ 2012-08-07  3:48 UTC (permalink / raw)
  To: isimatu.yasuaki
  Cc: linux-s390, linux-ia64, linux-acpi, len.brown, linux-sh,
	linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
	kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <1343980161-14254-16-git-send-email-wency@cn.fujitsu.com>

At 08/03/2012 03:49 PM, wency@cn.fujitsu.com Wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> For removing memmap region of sparse-vmemmap which is allocated bootmem,
> memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
> So the patch searches pages of virtual mapping and registers the pages by
> get_page_bootmem().
> 
> Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
> and sparc.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
>  arch/ia64/mm/discontig.c       |    6 ++++
>  arch/powerpc/mm/init_64.c      |    6 ++++
>  arch/s390/mm/vmem.c            |    6 ++++
>  arch/sparc/mm/init_64.c        |    6 ++++
>  arch/x86/mm/init_64.c          |   52 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/memory_hotplug.h |    2 +
>  include/linux/mm.h             |    3 +-
>  mm/memory_hotplug.c            |   23 +++++++++++++++--
>  8 files changed, 100 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index c641333..33943db 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct page *start_page,
>  {
>  	return vmemmap_populate_basepages(start_page, size, node);
>  }
> +
> +void register_page_bootmem_memmap(unsigned long section_nr,
> +				  struct page *start_page, unsigned long size)
> +{
> +	/* TODO */
> +}
>  #endif
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 620b7ac..3690c44 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -298,5 +298,11 @@ int __meminit vmemmap_populate(struct page *start_page,
>  
>  	return 0;
>  }
> +
> +void register_page_bootmem_memmap(unsigned long section_nr,
> +				  struct page *start_page, unsigned long size)
> +{
> +	/* TODO */
> +}
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 6f896e7..eda55cd 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -227,6 +227,12 @@ out:
>  	return ret;
>  }
>  
> +void register_page_bootmem_memmap(unsigned long section_nr,
> +				  struct page *start_page, unsigned long size)
> +{
> +	/* TODO */
> +}
> +
>  /*
>   * Add memory segment to the segment list if it doesn't overlap with
>   * an already present segment.
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 6026fdd..53f7604 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2059,6 +2059,12 @@ int __meminit vmemmap_populate(struct page *start, unsigned long nr, int node)
>  	}
>  	return 0;
>  }
> +
> +void register_page_bootmem_memmap(unsigned long section_nr,
> +				  struct page *start_page, unsigned long size)
> +{
> +	/* TODO */
> +}
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  static void prot_init_common(unsigned long page_none,
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index e0d88ba..0075592 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1138,6 +1138,58 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
>  	return 0;
>  }
>  
> +void register_page_bootmem_memmap(unsigned long section_nr,
> +				  struct page *start_page, unsigned long size)
> +{
> +	unsigned long addr = (unsigned long)start_page;
> +	unsigned long end = (unsigned long)(start_page + size);
> +	unsigned long next;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	for (; addr < end; addr = next) {
> +		pte_t *pte = NULL;
> +
> +		pgd = pgd_offset_k(addr);
> +		if (pgd_none(*pgd)) {
> +			next = (addr + PAGE_SIZE) & PAGE_MASK;
> +			continue;
> +		}
> +		get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
> +
> +		pud = pud_offset(pgd, addr);
> +		if (pud_none(*pud)) {
> +			next = (addr + PAGE_SIZE) & PAGE_MASK;
> +			continue;
> +		}
> +		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
> +
> +		if (!cpu_has_pse) {
> +			next = (addr + PAGE_SIZE) & PAGE_MASK;
> +			pmd = pmd_offset(pud, addr);
> +			if (pmd_none(*pmd))
> +				continue;
> +			get_page_bootmem(section_nr, pmd_page(*pmd),
> +					 MIX_SECTION_INFO);
> +
> +			pte = pte_offset_kernel(pmd, addr);
> +			if (pte_none(*pte))
> +				continue;
> +			get_page_bootmem(section_nr, pte_page(*pte),
> +					 SECTION_INFO);
> +		} else {
> +			next = pmd_addr_end(addr, end);
> +
> +			pmd = pmd_offset(pud, addr);
> +			if (pmd_none(*pmd))
> +				continue;
> +			get_page_bootmem(section_nr, pmd_page(*pmd),
> +					 SECTION_INFO);
> +		}
> +	}
> +}
> +
>  void __meminit vmemmap_populate_print_last(void)
>  {
>  	if (p_start) {
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 1133e63..2d18235 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -164,6 +164,8 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
>  
>  extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
>  extern void put_page_bootmem(struct page *page);
> +extern void get_page_bootmem(unsigned long ingo, struct page *page,
> +			     unsigned long type);
>  
>  /*
>   * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 311be90..c607913 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1618,7 +1618,8 @@ int vmemmap_populate_basepages(struct page *start_page,
>  						unsigned long pages, int node);
>  int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
>  void vmemmap_populate_print_last(void);
> -
> +void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
> +				  unsigned long size);
>  
>  enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3a264a5..4589f5b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -91,8 +91,8 @@ static void release_memory_resource(struct resource *res)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> -static void get_page_bootmem(unsigned long info,  struct page *page,
> -			     unsigned long type)
> +void get_page_bootmem(unsigned long info,  struct page *page,
> +		      unsigned long type)
>  {
>  	unsigned long page_type;
>  
> @@ -164,8 +164,25 @@ static void register_page_bootmem_info_section(unsigned long start_pfn)
>  
>  }
>  #else
> -static inline void register_page_bootmem_info_section(unsigned long start_pfn)
> +static void register_page_bootmem_info_section(unsigned long start_pfn)
>  {
> +	unsigned long mapsize, section_nr;
> +	struct mem_section *ms;
> +	struct page *page, *memmap;
> +
> +	if (!pfn_valid(start_pfn))
> +		return;
> +
> +	section_nr = pfn_to_section_nr(start_pfn);
> +	ms = __nr_to_section(section_nr);
> +
> +	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +
> +	page = virt_to_page(memmap);
> +	mapsize = sizeof(struct page) * PAGES_PER_SECTION;
> +	mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT;
> +
> +	register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);

You only handle memmap here. I think usemap should be also handled here.

Thanks
Wen Congyang

>  }
>  #endif
>  

^ permalink raw reply

* Re: [PATCH V5 3/3] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Li Yang @ 2012-08-07  4:20 UTC (permalink / raw)
  To: Scott Wood
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	Jia Hongtao-B38951
In-Reply-To: <501FDE40.1060906@freescale.com>

On Mon, Aug 6, 2012 at 11:09 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 08/05/2012 10:07 PM, Jia Hongtao-B38951 wrote:
>>
>>
>>> -----Original Message-----
>>> From: Wood Scott-B07421
>>> Sent: Saturday, August 04, 2012 12:28 AM
>>> To: Jia Hongtao-B38951
>>> Cc: linuxppc-dev@lists.ozlabs.org; galak@kernel.crashing.org; Li Yang-
>>> R58472; Wood Scott-B07421
>>> Subject: Re: [PATCH V5 3/3] powerpc/fsl-pci: Unify pci/pcie
>>> initialization code
>>>
>>> On 08/03/2012 05:14 AM, Jia Hongtao wrote:
>>>> -void __devinit fsl_pci_init(void)
>>>> +/* Checkout if PCI contains ISA node */
>>>> +static int of_pci_has_isa(struct device_node *pci_node)
>>>> +{
>>>> +   struct device_node *np;
>>>> +   int ret = 0;
>>>> +
>>>> +   if (!pci_node)
>>>> +           return 0;
>>>> +
>>>> +   read_lock(&devtree_lock);
>>>> +   np = pci_node->allnext;
>>>> +
>>>> +   /* Only scan the children of PCI node */
>>>> +   for (; np != pci_node->sibling; np = np->allnext) {
>>>> +           if (np->type && (of_node_cmp(np->type, "isa") == 0)
>>>> +               && of_node_get(np)) {
>>>> +                   ret = 1;
>>>> +                   break;
>>>> +           }
>>>> +   }
>>>> +
>>>> +   of_node_put(pci_node);
>>>> +   read_unlock(&devtree_lock);
>>>> +
>>>> +   return ret;
>>>> +}
>>>
>>> Why do you keep insisting on substituting your ISA search code here?
>>> What advantages does it have over the code that is already there?  It
>>> unnecessarily digs into the internals of the tree representation.
>>>
>>
>> I want ISA search is done from probe.
>
> Too bad.  You're breaking the case where there's no ISA node.
>

We can also take care of special cases with our approach if needed.
But it's not correct to assume the first PCI controller is the primary
one if there is no ISA node.  Your approach is still a band-aid to me.
 We can come back to this issue when we do find a proper solution.

>> Also this way is more efficient due
>> to we just search the children of PCI.
>
> It is not more efficient, because you're doing the search for every PCIe
> bus rather than once.  Not that it matters in this context.

We end up scanning at most a few PCI nodes instead of the whole device
tree for the primary.

>
>>>> +
>>>> +static int __devinit fsl_pci_probe(struct platform_device *pdev)
>>>>  {
>>>>     int ret;
>>>> -   struct device_node *node;
>>>>     struct pci_controller *hose;
>>>> -   dma_addr_t max = 0xffffffff;
>>>> +   int is_primary = 0;
>>>>
>>>> -   /* Callers can specify the primary bus using other means. */
>>>>     if (!fsl_pci_primary) {
>>>> -           /* If a PCI host bridge contains an ISA node, it's primary.
>>> */
>>>> -           node = of_find_node_by_type(NULL, "isa");
>>>> -           while ((fsl_pci_primary = of_get_parent(node))) {
>>>> -                   of_node_put(node);
>>>> -                   node = fsl_pci_primary;
>>>> -
>>>> -                   if (of_match_node(pci_ids, node))
>>>> -                           break;
>>>> -           }
>>>> +           is_primary = of_pci_has_isa(pdev->dev.of_node);
>>>> +           if (is_primary)
>>>> +                   fsl_pci_primary = pdev->dev.of_node;
>>>>     }
>>>
>>> As I explained before, this has to be done globally, not from the probe
>>> function, so we can assign a default primary bus if there isn't any ISA.
>>>  There are bugs in the Linux PPC PCI code relating to not having any
>>> primary bus.
>>>
>>> -Scott
>>
>> In my way of searching ISA you can also assign a default primary bus in board
>> specific files.
>
> That was meant for when the board file had an alternate way of searching
> for the primary bus (e.g. look for i8259), not as a replacement for the
> mechanism that guarantees there's a primary bus.
>
> You are causing a regression in the qemu_e500.c platform.

Can we fix the qemu device tree to address the problem if we do make
it a rule to use the ISA node to indicate the primary bus?

- Leo

^ permalink raw reply

* Re: [RFC PATCH V6 16/19] memory-hotplug: free memmap of sparse-vmemmap
From: Wen Congyang @ 2012-08-07  5:17 UTC (permalink / raw)
  To: wency
  Cc: linux-s390, linux-ia64, linux-acpi, len.brown, linux-sh,
	linux-kernel, cmetcalf, linux-mm, isimatu.yasuaki, paulus,
	minchan.kim, kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm,
	liuj97
In-Reply-To: <1343980161-14254-17-git-send-email-wency@cn.fujitsu.com>

At 08/03/2012 03:49 PM, wency@cn.fujitsu.com Wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>

This line is wrong. This patch is from Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

> 
> All pages of virtual mapping in removed memory cannot be freed, since some pages
> used as PGD/PUD includes not only removed memory but also other memory. So the
> patch checks whether page can be freed or not.
> 
> How to check whether page can be freed or not?
>  1. When removing memory, the page structs of the revmoved memory are filled
>     with 0FD.
>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>     In this case, the page used as PT/PMD can be freed.
> 
> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
> 
> Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
> ppc, s390, and sparc.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
>  arch/ia64/mm/discontig.c  |    8 +++
>  arch/powerpc/mm/init_64.c |    8 +++
>  arch/s390/mm/vmem.c       |    8 +++
>  arch/sparc/mm/init_64.c   |    8 +++
>  arch/x86/mm/init_64.c     |  119 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mm.h        |    2 +
>  mm/memory_hotplug.c       |   17 +------
>  mm/sparse.c               |    5 +-
>  8 files changed, 158 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 33943db..0d23b69 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>  	return vmemmap_populate_basepages(start_page, size, node);
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 3690c44..835a2b3 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>  	return 0;
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index eda55cd..4b42b0b 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -227,6 +227,14 @@ out:
>  	return ret;
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 53f7604..d444f25 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2060,6 +2060,14 @@ int __meminit vmemmap_populate(struct page *start, unsigned long nr, int node)
>  	return 0;
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 0075592..4e8f8a4 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
>  	return 0;
>  }
>  
> +#define PAGE_INUSE 0xFD
> +
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> +			    struct page **pp, int *page_size)
> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	void *page_addr;
> +	unsigned long next;
> +
> +	*pp = NULL;
> +
> +	pgd = pgd_offset_k(addr);
> +	if (pgd_none(*pgd))
> +		return pgd_addr_end(addr, end);
> +
> +	pud = pud_offset(pgd, addr);
> +	if (pud_none(*pud))
> +		return pud_addr_end(addr, end);
> +
> +	if (!cpu_has_pse) {
> +		next = (addr + PAGE_SIZE) & PAGE_MASK;
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		pte = pte_offset_kernel(pmd, addr);
> +		if (pte_none(*pte))
> +			return next;
> +
> +		*page_size = PAGE_SIZE;
> +		*pp = pte_page(*pte);
> +	} else {
> +		next = pmd_addr_end(addr, end);
> +
> +		pmd = pmd_offset(pud, addr);
> +		if (pmd_none(*pmd))
> +			return next;
> +
> +		*page_size = PMD_SIZE;
> +		*pp = pmd_page(*pmd);
> +	}
> +
> +	/*
> +	 * Removed page structs are filled with 0xFD.
> +	 */
> +	memset((void *)addr, PAGE_INUSE, next - addr);
> +
> +	page_addr = page_address(*pp);
> +
> +	/*
> +	 * Check the page is filled with 0xFD or not.
> +	 * memchr_inv() returns the address. In this case, we cannot
> +	 * clear PTE/PUD entry, since the page is used by other.
> +	 * So we cannot also free the page.
> +	 *
> +	 * memchr_inv() returns NULL. In this case, we can clear
> +	 * PTE/PUD entry, since the page is not used by other.
> +	 * So we can also free the page.
> +	 */
> +	if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
> +		*pp = NULL;
> +		return next;
> +	}
> +
> +	if (!cpu_has_pse)
> +		pte_clear(&init_mm, addr, pte);
> +	else
> +		pmd_clear(pmd);
> +
> +	return next;
> +}
> +
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		free_pages((unsigned long)page_address(page),
> +			    get_order(page_size));
> +		__flush_tlb_one(addr);
> +	}
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +	unsigned long addr = (unsigned long)memmap;
> +	unsigned long end = (unsigned long)(memmap + nr_pages);
> +	unsigned long next;
> +	struct page *page;
> +	int page_size;
> +	unsigned long magic;
> +
> +	for (; addr < end; addr = next) {
> +		page = NULL;
> +		page_size = 0;
> +		next = find_and_clear_pte_page(addr, end, &page, &page_size);
> +		if (!page)
> +			continue;
> +
> +		magic = (unsigned long) page->lru.next;
> +		if (magic == SECTION_INFO)
> +			put_page_bootmem(page);
> +		flush_tlb_kernel_range(addr, end);
> +	}
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
>  				  struct page *start_page, unsigned long size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c607913..fb0d1fc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1620,6 +1620,8 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
>  void vmemmap_populate_print_last(void);
>  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
>  				  unsigned long size);
> +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
> +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
>  
>  enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4589f5b..a1f3490 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -300,19 +300,6 @@ static int __meminit __add_section(int nid, struct zone *zone,
>  	return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
>  }
>  
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int __remove_section(struct zone *zone, struct mem_section *ms)
> -{
> -	int ret = -EINVAL;
> -
> -	if (!valid_section(ms))
> -		return ret;
> -
> -	ret = unregister_memory_section(ms);
> -
> -	return ret;
> -}
> -#else
>  static int __remove_section(struct zone *zone, struct mem_section *ms)
>  {
>  	unsigned long flags;
> @@ -329,9 +316,9 @@ static int __remove_section(struct zone *zone, struct mem_section *ms)
>  	pgdat_resize_lock(pgdat, &flags);
>  	sparse_remove_one_section(zone, ms);
>  	pgdat_resize_unlock(pgdat, &flags);
> -	return 0;
> +
> +	return ret;
>  }
> -#endif
>  
>  /*
>   * Reasonably generic function for adding memory.  It is
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fac95f2..ab9d755 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -613,12 +613,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
>  	/* This will make the necessary allocations eventually. */
>  	return sparse_mem_map_populate(pnum, nid);
>  }
> -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
> +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
>  {
> -	return; /* XXX: Not implemented yet */
> +	vmemmap_kfree(page, nr_pages);
>  }
>  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
>  {
> +	vmemmap_free_bootmem(page, nr_pages);
>  }
>  #else
>  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)

^ permalink raw reply

* Re: [RFC PATCH V6 13/19] memory-hotplug: check page type in get_page_bootmem
From: Wen Congyang @ 2012-08-07  5:31 UTC (permalink / raw)
  To: isimatu.yasuaki
  Cc: linux-s390, linux-ia64, linux-acpi, len.brown, linux-sh,
	linux-kernel, cmetcalf, linux-mm, paulus, minchan.kim,
	kosaki.motohiro, rientjes, cl, linuxppc-dev, akpm, liuj97
In-Reply-To: <1343980161-14254-14-git-send-email-wency@cn.fujitsu.com>

At 08/03/2012 03:49 PM, wency@cn.fujitsu.com Wrote:
> From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 
> There is a possibility that get_page_bootmem() is called to the same page many
> times. So when get_page_bootmem is called to the same page, the function only
> increments page->_count.
> 
> CC: David Rientjes <rientjes@google.com>
> CC: Jiang Liu <liuj97@gmail.com>
> CC: Len Brown <len.brown@intel.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
>  mm/memory_hotplug.c |   15 +++++++++++----
>  1 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 5f9f8c7..710e593 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
>  static void get_page_bootmem(unsigned long info,  struct page *page,
>  			     unsigned long type)
>  {
> -	page->lru.next = (struct list_head *) type;
> -	SetPagePrivate(page);
> -	set_page_private(page, info);
> -	atomic_inc(&page->_count);
> +	unsigned long page_type;
> +
> +	page_type = (unsigned long) page->lru.next;
> +	if (type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
> +	    type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){

I think it should be page_type not type here.

Thanks
Wen Congyang

> +		page->lru.next = (struct list_head *) type;
> +		SetPagePrivate(page);
> +		set_page_private(page, info);
> +		atomic_inc(&page->_count);
> +	} else
> +		atomic_inc(&page->_count);
>  }
>  
>  /* reference to __meminit __free_pages_bootmem is valid

^ permalink raw reply

* RE: [PATCH V4 3/3] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Jia Hongtao-B38951 @ 2012-08-07  6:23 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <501FDF9B.4030304@freescale.com>

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBXb29kIFNjb3R0LUIwNzQyMQ0K
PiBTZW50OiBNb25kYXksIEF1Z3VzdCAwNiwgMjAxMiAxMToxNiBQTQ0KPiBUbzogSmlhIEhvbmd0
YW8tQjM4OTUxDQo+IENjOiBXb29kIFNjb3R0LUIwNzQyMTsgS3VtYXIgR2FsYTsgbGludXhwcGMt
ZGV2QGxpc3RzLm96bGFicy5vcmc7IExpDQo+IFlhbmctUjU4NDcyDQo+IFN1YmplY3Q6IFJlOiBb
UEFUQ0ggVjQgMy8zXSBwb3dlcnBjL2ZzbC1wY2k6IFVuaWZ5IHBjaS9wY2llDQo+IGluaXRpYWxp
emF0aW9uIGNvZGUNCj4gDQo+IE9uIDA4LzA1LzIwMTIgMDk6MzkgUE0sIEppYSBIb25ndGFvLUIz
ODk1MSB3cm90ZToNCj4gPg0KPiA+DQo+ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+
ID4+IEZyb206IFdvb2QgU2NvdHQtQjA3NDIxDQo+ID4+IFNlbnQ6IFNhdHVyZGF5LCBBdWd1c3Qg
MDQsIDIwMTIgMTI6MDQgQU0NCj4gPj4gVG86IEppYSBIb25ndGFvLUIzODk1MQ0KPiA+PiBDYzog
S3VtYXIgR2FsYTsgbGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7IFdvb2QgU2NvdHQtQjA3
NDIxOyBMaQ0KPiA+PiBZYW5nLVI1ODQ3Mg0KPiA+PiBTdWJqZWN0OiBSZTogW1BBVENIIFY0IDMv
M10gcG93ZXJwYy9mc2wtcGNpOiBVbmlmeSBwY2kvcGNpZQ0KPiA+PiBpbml0aWFsaXphdGlvbiBj
b2RlDQo+ID4+DQo+ID4+IE9uIDA4LzAyLzIwMTIgMTA6MzkgUE0sIEppYSBIb25ndGFvLUIzODk1
MSB3cm90ZToNCj4gPj4+DQo+ID4+Pg0KPiA+Pj4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0t
DQo+ID4+Pj4gRnJvbTogS3VtYXIgR2FsYSBbbWFpbHRvOmdhbGFrQGtlcm5lbC5jcmFzaGluZy5v
cmddDQo+ID4+Pj4gU2VudDogVGh1cnNkYXksIEF1Z3VzdCAwMiwgMjAxMiA4OjI0IFBNDQo+ID4+
Pj4gVG86IEppYSBIb25ndGFvLUIzODk1MQ0KPiA+Pj4+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMu
b3psYWJzLm9yZzsgV29vZCBTY290dC1CMDc0MjE7IExpDQo+ID4+Pj4gWWFuZy1SNTg0NzINCj4g
Pj4+PiBTdWJqZWN0OiBSZTogW1BBVENIIFY0IDMvM10gcG93ZXJwYy9mc2wtcGNpOiBVbmlmeSBw
Y2kvcGNpZQ0KPiA+Pj4+IGluaXRpYWxpemF0aW9uIGNvZGUNCj4gPj4+Pg0KPiA+Pj4+IFlvdSBu
ZWVkIHRvIGNvbnZlcnQgYWxsIGJvYXJkcyB0byB1c2UgZnNsX3BjaV9pbml0IGJlZm9yZSB0aGlz
IHBhdGNoLg0KPiA+Pj4+IE90aGVyd2lzZSB3ZSdsbCBlbmQgdXAgd2l0aCBQQ0kgZ2V0dGluZyBp
bml0aWFsaXplZCB0d2ljZSBvbiBib2FyZHMuDQo+ID4+Pj4NCj4gPj4+PiAtIGsNCj4gPj4+DQo+
ID4+PiBJZiB3ZSBjb3ZlcnQgYWxsIGJvYXJkcyB3aXRoIHBsYXRmb3JtIGRyaXZlciBpbiB0aGlz
IHBhdGNoIFBDSSB3aWxsDQo+ID4+PiBiZSBpbml0aWFsaXplZCBvbmx5IG9uY2Ugd2l0aG91dCBj
b252ZXJ0aW5nIGFsbCBib2FyZHMgdG8gdXNlDQo+ID4+PiBmc2xfcGNpX2luaXQgZmlyc3QuDQo+
ID4+DQo+ID4+IFRoZW4gd2UnZCBoYXZlIHRvIHBpY2sgYXBhcnQgY29yZSBjaGFuZ2VzIGZyb20g
Ym9hcmQgY2hhbmdlcyB3aGVuDQo+ID4+IHJldmlld2luZy4NCj4gPj4NCj4gPj4+IElmIHdlIGNv
bnZlcnQgYWxsIGJvYXJkcyB0byB1c2UgZnNsX3BjaV9pbml0IGJlZm9yZSB0aGlzIHBhdGNoIGFu
ZA0KPiA+Pj4gY29udmVydCB0aGVtIHRvIHVzZSBwbGF0Zm9ybSBkcml2ZXIgYWdhaW4gYWZ0ZXIg
dGhpcyBwYXRjaC4gVGhlbg0KPiA+Pj4gYmV0d2VlbiB0aGlzIHBhdGNoIGFuZCBuZXh0IHBjaSB3
aWxsIGJlIGluaXRpYWxpemVkIHR3aWNlIHRvby4NCj4gPj4NCj4gPj4gV2h5PyAgVGhhdCBvbmUg
cGF0Y2ggc2hvdWxkIGJvdGggY3JlYXRlIHRoZSBwbGF0Zm9ybSBkcml2ZXIgYW5kDQo+ID4+IHJl
bW92ZSB0aGUgaW5pdCBmcm9tIGZzbF9wY2lfaW5pdCgpIC0tIGV4Y2VwdCB0aGluZ3MgbGlrZSBw
cmltYXJ5IGJ1cw0KPiA+PiBkZXRlY3Rpb24gd2hpY2ggaGFzIHRvIGhhcHBlbiBnbG9iYWxseS4N
Cj4gPj4NCj4gPj4gLVNjb3R0DQo+ID4NCj4gPiAiT25lIHBhdGNoIGJvdGggY3JlYXRlIHRoZSBw
bGF0Zm9ybSBkcml2ZXIgYW5kIHJlbW92ZSB0aGUgaW5pdCBmcm9tDQo+ID4gZnNsX3BjaV9pbml0
KCkiIG1lYW5zIHdlIHNob3VsZCBjcmVhdGUgcGxhdGZvcm0gZHJpdmVyIGFuZCBhcHBsaWVkIHRv
DQo+ID4gYWxsIGJvYXJkcy4gSWYgc28gd2h5IG5vdCBqdXN0IGRpcmVjdGx5IGNvbnZlcnQgYWxs
IGJvYXJkcyB1c2luZw0KPiA+IHBsYXRmb3JtIGRyaXZlcj8NCj4gDQo+IEJlY2F1c2UgaXQncyBo
YXJkZXIgdG8gcmV2aWV3IHdoZW4geW91IGhhdmUgYSBidW5jaCBvZiBib2FyZCBjb2RlIGluIHRo
ZQ0KPiBwYXRjaCBpbiBhZGRpdGlvbiB0byBjb3JlIGNoYW5nZXMuDQo+IA0KPiBCZWNhdXNlIHlv
dSBtaWdodCB3YW50IHBlb3BsZSB0byBhY3R1YWxseSB0ZXN0IG9uIHRoZSBib2FyZHMgaW4gcXVl
c3Rpb24NCj4gd2hlbiBjb252ZXJ0aW5nLCBlc3BlY2lhbGx5IGdpdmVuIHRoZSBjaGFuZ2UgaW4g
aG93IHByaW1hcnkgYnVzZXMgYXJlDQo+IGRldGVybWluZWQsIGFuZCB0aGF0IHNvbWUgYm9hcmRz
IG1heSBuZWVkIHRvIHByb3ZpZGUgdGhlaXIgb3duDQo+IGFsdGVybmF0aXZlLg0KPiANCj4gLVNj
b3R0DQoNCkJ1dCBpZiB3ZSBzZXBhcmF0ZSB0aGUgY29yZSBjaGFuZ2VzIGFuZCB0aGUgYm9hcmRz
IHVwZGF0ZSwgYmV0d2VlbiB0aGlzIHR3bw0KcGF0Y2hlcyBQQ0kgd2lsbCBiZSBpbml0aWFsaXpl
ZCB0d2ljZS4NCg0KLUhvbmd0YW8uDQoNCg==

^ permalink raw reply

* RE: [PATCH V5 3/3] powerpc/fsl-pci: Unify pci/pcie initialization code
From: Jia Hongtao-B38951 @ 2012-08-07  8:09 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <501FDE40.1060906@freescale.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV29vZCBTY290dC1CMDc0
MjENCj4gU2VudDogTW9uZGF5LCBBdWd1c3QgMDYsIDIwMTIgMTE6MTAgUE0NCj4gVG86IEppYSBI
b25ndGFvLUIzODk1MQ0KPiBDYzogV29vZCBTY290dC1CMDc0MjE7IGxpbnV4cHBjLWRldkBsaXN0
cy5vemxhYnMub3JnOw0KPiBnYWxha0BrZXJuZWwuY3Jhc2hpbmcub3JnOyBMaSBZYW5nLVI1ODQ3
Mg0KPiBTdWJqZWN0OiBSZTogW1BBVENIIFY1IDMvM10gcG93ZXJwYy9mc2wtcGNpOiBVbmlmeSBw
Y2kvcGNpZQ0KPiBpbml0aWFsaXphdGlvbiBjb2RlDQo+IA0KPiBPbiAwOC8wNS8yMDEyIDEwOjA3
IFBNLCBKaWEgSG9uZ3Rhby1CMzg5NTEgd3JvdGU6DQo+ID4NCj4gPg0KPiA+PiAtLS0tLU9yaWdp
bmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiBGcm9tOiBXb29kIFNjb3R0LUIwNzQyMQ0KPiA+PiBTZW50
OiBTYXR1cmRheSwgQXVndXN0IDA0LCAyMDEyIDEyOjI4IEFNDQo+ID4+IFRvOiBKaWEgSG9uZ3Rh
by1CMzg5NTENCj4gPj4gQ2M6IGxpbnV4cHBjLWRldkBsaXN0cy5vemxhYnMub3JnOyBnYWxha0Br
ZXJuZWwuY3Jhc2hpbmcub3JnOyBMaQ0KPiA+PiBZYW5nLSBSNTg0NzI7IFdvb2QgU2NvdHQtQjA3
NDIxDQo+ID4+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggVjUgMy8zXSBwb3dlcnBjL2ZzbC1wY2k6IFVu
aWZ5IHBjaS9wY2llDQo+ID4+IGluaXRpYWxpemF0aW9uIGNvZGUNCj4gPj4NCj4gPj4gT24gMDgv
MDMvMjAxMiAwNToxNCBBTSwgSmlhIEhvbmd0YW8gd3JvdGU6DQo+ID4+PiAtdm9pZCBfX2Rldmlu
aXQgZnNsX3BjaV9pbml0KHZvaWQpDQo+ID4+PiArLyogQ2hlY2tvdXQgaWYgUENJIGNvbnRhaW5z
IElTQSBub2RlICovIHN0YXRpYyBpbnQNCj4gPj4+ICtvZl9wY2lfaGFzX2lzYShzdHJ1Y3QgZGV2
aWNlX25vZGUgKnBjaV9ub2RlKSB7DQo+ID4+PiArCXN0cnVjdCBkZXZpY2Vfbm9kZSAqbnA7DQo+
ID4+PiArCWludCByZXQgPSAwOw0KPiA+Pj4gKw0KPiA+Pj4gKwlpZiAoIXBjaV9ub2RlKQ0KPiA+
Pj4gKwkJcmV0dXJuIDA7DQo+ID4+PiArDQo+ID4+PiArCXJlYWRfbG9jaygmZGV2dHJlZV9sb2Nr
KTsNCj4gPj4+ICsJbnAgPSBwY2lfbm9kZS0+YWxsbmV4dDsNCj4gPj4+ICsNCj4gPj4+ICsJLyog
T25seSBzY2FuIHRoZSBjaGlsZHJlbiBvZiBQQ0kgbm9kZSAqLw0KPiA+Pj4gKwlmb3IgKDsgbnAg
IT0gcGNpX25vZGUtPnNpYmxpbmc7IG5wID0gbnAtPmFsbG5leHQpIHsNCj4gPj4+ICsJCWlmIChu
cC0+dHlwZSAmJiAob2Zfbm9kZV9jbXAobnAtPnR5cGUsICJpc2EiKSA9PSAwKQ0KPiA+Pj4gKwkJ
ICAgICYmIG9mX25vZGVfZ2V0KG5wKSkgew0KPiA+Pj4gKwkJCXJldCA9IDE7DQo+ID4+PiArCQkJ
YnJlYWs7DQo+ID4+PiArCQl9DQo+ID4+PiArCX0NCj4gPj4+ICsNCj4gPj4+ICsJb2Zfbm9kZV9w
dXQocGNpX25vZGUpOw0KPiA+Pj4gKwlyZWFkX3VubG9jaygmZGV2dHJlZV9sb2NrKTsNCj4gPj4+
ICsNCj4gPj4+ICsJcmV0dXJuIHJldDsNCj4gPj4+ICt9DQo+ID4+DQo+ID4+IFdoeSBkbyB5b3Ug
a2VlcCBpbnNpc3Rpbmcgb24gc3Vic3RpdHV0aW5nIHlvdXIgSVNBIHNlYXJjaCBjb2RlIGhlcmU/
DQo+ID4+IFdoYXQgYWR2YW50YWdlcyBkb2VzIGl0IGhhdmUgb3ZlciB0aGUgY29kZSB0aGF0IGlz
IGFscmVhZHkgdGhlcmU/ICBJdA0KPiA+PiB1bm5lY2Vzc2FyaWx5IGRpZ3MgaW50byB0aGUgaW50
ZXJuYWxzIG9mIHRoZSB0cmVlIHJlcHJlc2VudGF0aW9uLg0KPiA+Pg0KPiA+DQo+ID4gSSB3YW50
IElTQSBzZWFyY2ggaXMgZG9uZSBmcm9tIHByb2JlLg0KPiANCj4gVG9vIGJhZC4gIFlvdSdyZSBi
cmVha2luZyB0aGUgY2FzZSB3aGVyZSB0aGVyZSdzIG5vIElTQSBub2RlLg0KPiANCj4gPiBBbHNv
IHRoaXMgd2F5IGlzIG1vcmUgZWZmaWNpZW50IGR1ZQ0KPiA+IHRvIHdlIGp1c3Qgc2VhcmNoIHRo
ZSBjaGlsZHJlbiBvZiBQQ0kuDQo+IA0KPiBJdCBpcyBub3QgbW9yZSBlZmZpY2llbnQsIGJlY2F1
c2UgeW91J3JlIGRvaW5nIHRoZSBzZWFyY2ggZm9yIGV2ZXJ5IFBDSWUNCj4gYnVzIHJhdGhlciB0
aGFuIG9uY2UuICBOb3QgdGhhdCBpdCBtYXR0ZXJzIGluIHRoaXMgY29udGV4dC4NCj4gDQo+ID4+
PiArDQo+ID4+PiArc3RhdGljIGludCBfX2RldmluaXQgZnNsX3BjaV9wcm9iZShzdHJ1Y3QgcGxh
dGZvcm1fZGV2aWNlICpwZGV2KQ0KPiA+Pj4gIHsNCj4gPj4+ICAJaW50IHJldDsNCj4gPj4+IC0J
c3RydWN0IGRldmljZV9ub2RlICpub2RlOw0KPiA+Pj4gIAlzdHJ1Y3QgcGNpX2NvbnRyb2xsZXIg
Kmhvc2U7DQo+ID4+PiAtCWRtYV9hZGRyX3QgbWF4ID0gMHhmZmZmZmZmZjsNCj4gPj4+ICsJaW50
IGlzX3ByaW1hcnkgPSAwOw0KPiA+Pj4NCj4gPj4+IC0JLyogQ2FsbGVycyBjYW4gc3BlY2lmeSB0
aGUgcHJpbWFyeSBidXMgdXNpbmcgb3RoZXIgbWVhbnMuICovDQo+ID4+PiAgCWlmICghZnNsX3Bj
aV9wcmltYXJ5KSB7DQo+ID4+PiAtCQkvKiBJZiBhIFBDSSBob3N0IGJyaWRnZSBjb250YWlucyBh
biBJU0Egbm9kZSwgaXQncyBwcmltYXJ5Lg0KPiA+PiAqLw0KPiA+Pj4gLQkJbm9kZSA9IG9mX2Zp
bmRfbm9kZV9ieV90eXBlKE5VTEwsICJpc2EiKTsNCj4gPj4+IC0JCXdoaWxlICgoZnNsX3BjaV9w
cmltYXJ5ID0gb2ZfZ2V0X3BhcmVudChub2RlKSkpIHsNCj4gPj4+IC0JCQlvZl9ub2RlX3B1dChu
b2RlKTsNCj4gPj4+IC0JCQlub2RlID0gZnNsX3BjaV9wcmltYXJ5Ow0KPiA+Pj4gLQ0KPiA+Pj4g
LQkJCWlmIChvZl9tYXRjaF9ub2RlKHBjaV9pZHMsIG5vZGUpKQ0KPiA+Pj4gLQkJCQlicmVhazsN
Cj4gPj4+IC0JCX0NCj4gPj4+ICsJCWlzX3ByaW1hcnkgPSBvZl9wY2lfaGFzX2lzYShwZGV2LT5k
ZXYub2Zfbm9kZSk7DQo+ID4+PiArCQlpZiAoaXNfcHJpbWFyeSkNCj4gPj4+ICsJCQlmc2xfcGNp
X3ByaW1hcnkgPSBwZGV2LT5kZXYub2Zfbm9kZTsNCj4gPj4+ICAJfQ0KPiA+Pg0KPiA+PiBBcyBJ
IGV4cGxhaW5lZCBiZWZvcmUsIHRoaXMgaGFzIHRvIGJlIGRvbmUgZ2xvYmFsbHksIG5vdCBmcm9t
IHRoZQ0KPiA+PiBwcm9iZSBmdW5jdGlvbiwgc28gd2UgY2FuIGFzc2lnbiBhIGRlZmF1bHQgcHJp
bWFyeSBidXMgaWYgdGhlcmUgaXNuJ3QNCj4gYW55IElTQS4NCj4gPj4gIFRoZXJlIGFyZSBidWdz
IGluIHRoZSBMaW51eCBQUEMgUENJIGNvZGUgcmVsYXRpbmcgdG8gbm90IGhhdmluZyBhbnkNCj4g
Pj4gcHJpbWFyeSBidXMuDQo+ID4+DQo+ID4+IC1TY290dA0KPiA+DQo+ID4gSW4gbXkgd2F5IG9m
IHNlYXJjaGluZyBJU0EgeW91IGNhbiBhbHNvIGFzc2lnbiBhIGRlZmF1bHQgcHJpbWFyeSBidXMN
Cj4gPiBpbiBib2FyZCBzcGVjaWZpYyBmaWxlcy4NCj4gDQo+IFRoYXQgd2FzIG1lYW50IGZvciB3
aGVuIHRoZSBib2FyZCBmaWxlIGhhZCBhbiBhbHRlcm5hdGUgd2F5IG9mIHNlYXJjaGluZw0KPiBm
b3IgdGhlIHByaW1hcnkgYnVzIChlLmcuIGxvb2sgZm9yIGk4MjU5KSwgbm90IGFzIGEgcmVwbGFj
ZW1lbnQgZm9yIHRoZQ0KPiBtZWNoYW5pc20gdGhhdCBndWFyYW50ZWVzIHRoZXJlJ3MgYSBwcmlt
YXJ5IGJ1cy4NCj4gDQo+IFlvdSBhcmUgY2F1c2luZyBhIHJlZ3Jlc3Npb24gaW4gdGhlIHFlbXVf
ZTUwMC5jIHBsYXRmb3JtLg0KPiANCj4gPiBJIHJlYWQgeW91ciBjb2RlIGFuZCBmb3VuZCB0aGF0
IGlmIHRoZXJlIGlzIG5vIElTQSBub2RlIHlvdSB3aWxsDQo+ID4gYXNzaWduIHRoZSBmaXJzdCBQ
Q0kgYnVzIHNjYW5uZWQgYXMgcHJpbWFyeS4gSXQncyBub3QgYWxsIHJpZ2h0LiBUYWtlDQo+ID4g
Z2VfaW1wM2EgYXMgYW4NCj4gPiBleGFtcGxlOiBUaGUgc2Vjb25kIFBDSSBidXMgKDkwMDApIGlz
IHByaW1hcnkgbm90IHRoZSBmaXJzdCBvbmUuDQo+IA0KPiBEb2VzIHRoYXQgYm9hcmQgaGF2ZSBJ
U0Egb24gaXQsIHRoYXQgaXNuJ3QgZGVzY3JpYmVkIGJ5IHRoZSBkZXZpY2UgdHJlZT8NCj4gIElm
IHNvLCBiZWZvcmUgY29udmVydGluZyB0byB0aGUgbmV3IGluaXQgbWVjaGFuaXNtLCB0aGUgYm9h
cmQgY29kZSB3aWxsDQo+IG5lZWQgdG8gc2V0IGZzbF9wY2lfcHJpbWFyeSBiYXNlZCBvbiBpdHMg
b3duIGtub3dsZWRnZSBvZiB3aGVyZSB0aGF0IElTQQ0KPiBpcy4gIElmIGl0IGRvZXNuJ3QgaGF2
ZSBJU0EsIGl0IGRvZXNuJ3QgbWF0dGVyIHdoaWNoIG9uZSB3ZSBkZXNpZ25hdGUgYXMNCj4gcHJp
bWFyeS4NCj4gDQo+ID4gSSBkb3VidCB0aGF0IHRoZXJlIGFyZSBidWdzIGlmIG5vIHByaW1hcnkg
YXNzaWduZWQuDQo+IA0KPiBZZWFoLCBJIGp1c3QgaW1wbGVtZW50ZWQgdGhlIGZhbGxiYWNrIGZv
ciBmdW4uICBDb21lIG9uLg0KPiANCj4gSXQgd2FzIHJlY2VudGx5IGRpc2N1c3NlZCBvbiB0aGlz
IGxpc3QuICBQQ0kgdW5kZXIgUUVNVSBkaWQgbm90IHdvcmsNCj4gd2l0aG91dCBpdC4NCj4gDQo+
ID4gTGlrZSBtcGM4NXh4X3JkYiBhc3NpZ25lZA0KPiA+IG5vIHByaW1hcnkgYXQgYWxsLiBTb21l
IG90aGVyIGJvYXJkcyBoYXMgbm8gcHJpbWFyeSBldGhlciBsaWtlDQo+ID4gcDEwMjJkcywgcDEw
MjFtZHMsIHAxMDEwcmRiLCBwMTAyM3JkcywgYWxsIGNvcmVuZXQgYm9hcmRzIChwMjA0MV9yZGIs
DQo+ID4gcDMwNDFfZHMsIHA0MDgwX2RzLCBwNTAyMF9kcywgcDUwNDBfZHMpLiBJZiBubyBwcmlt
YXJ5IGlzIGEgYnVnIHRoZW4NCj4gPiBhbGwgdGhlc2UgYm9hcmRzIGFib3ZlIGFyZSBub3QgY29y
cmVjdGx5IHNldHRpbmcgdXAuDQo+IA0KPiBUaG9zZSBib2FyZHMgYXJlIG5vdCBiZWluZyBjb3Jy
ZWN0bHkgc2V0IHVwLiAgT24gcmVhbCBoYXJkd2FyZSB0aGluZ3MNCj4gd29yayBieSBjaGFuY2Us
IGJ1dCBub3QgdW5kZXIgUUVNVS4NCj4gDQo+IC1TY290dA0KDQpJIGFtIHJlYWxseSBub3Qgc3Vy
ZSB0aGF0IGFsbCBib2FyZHMgbmVlZCBwcmltYXJ5IGJ1cy4gQ291bGQgeW91IGdpdmUgbWUNCnRo
ZSBsaW5rIG9mIGRpc2N1c3Npb24gYWJvdXQgcHJpbWFyeSB0aGF0IHlvdSBtZW50aW9uZWQ/DQoN
Ci1Ib25ndGFvLg0KDQo=

^ permalink raw reply

* [PATCH 1/4] powerpc/85xx: add sleep and deep sleep support
From: Zhao Chenhui @ 2012-08-07  8:43 UTC (permalink / raw)
  To: linuxppc-dev, galak; +Cc: linux-kernel

In sleep PM mode, the clocks of e500 core and unused IP blocks is
turned off. IP blocks which are allowed to wake up the processor
are still running.

Some Freescale chips like MPC8536 and P1022 has deep sleep PM mode
in addtion to the sleep PM mode.

While in deep sleep PM mode, additionally, the power supply is
removed from e500 core and most IP blocks. Only the blocks needed
to wake up the chip out of deep sleep are ON.

This patch supports 32-bit and 36-bit address space.

The sleep mode is equal to the Standby state in Linux. The deep sleep
mode is equal to the Suspend-to-RAM state of Linux Power Management.

Command to enter sleep mode.
  echo standby > /sys/power/state
Command to enter deep sleep mode.
  echo mem > /sys/power/state

Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
---
 arch/powerpc/Kconfig                  |    2 +-
 arch/powerpc/include/asm/cacheflush.h |    2 +
 arch/powerpc/kernel/Makefile          |    1 +
 arch/powerpc/kernel/cache_fsl.S       |   57 +++
 arch/powerpc/platforms/85xx/Makefile  |    1 +
 arch/powerpc/platforms/85xx/sleep.S   |  621 +++++++++++++++++++++++++++++++++
 arch/powerpc/sysdev/fsl_pmc.c         |   98 +++++-
 arch/powerpc/sysdev/fsl_soc.h         |    5 +
 8 files changed, 768 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/kernel/cache_fsl.S
 create mode 100644 arch/powerpc/platforms/85xx/sleep.S

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d894069..d7b0517 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -667,7 +667,7 @@ config FSL_PCI
 config FSL_PMC
 	bool
 	default y
-	depends on SUSPEND && (PPC_85xx || PPC_86xx)
+	depends on SUSPEND && (PPC_85xx || PPC_86xx) && !PPC_E500MC
 	help
 	  Freescale MPC85xx/MPC86xx power management controller support
 	  (suspend/resume). For MPC83xx see platforms/83xx/suspend.c
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index b843e35..6c5f1c2 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -58,6 +58,8 @@ extern void flush_inval_dcache_range(unsigned long start, unsigned long stop);
 extern void flush_dcache_phys_range(unsigned long start, unsigned long stop);
 #endif
 
+extern void flush_dcache_L1(void);
+
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 	do { \
 		memcpy(dst, src, len); \
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index bb282dd..21e025b 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_FA_DUMP)		+= fadump.o
 ifeq ($(CONFIG_PPC32),y)
 obj-$(CONFIG_E500)		+= idle_e500.o
 endif
+obj-y				+= cache_fsl.o
 obj-$(CONFIG_6xx)		+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o
 obj-$(CONFIG_TAU)		+= tau_6xx.o
 obj-$(CONFIG_HIBERNATION)	+= swsusp.o suspend.o
diff --git a/arch/powerpc/kernel/cache_fsl.S b/arch/powerpc/kernel/cache_fsl.S
new file mode 100644
index 0000000..25cd22e
--- /dev/null
+++ b/arch/powerpc/kernel/cache_fsl.S
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2009-2012 Freescale Semiconductor, Inc. All rights reserved.
+ *	Scott Wood <scottwood@freescale.com>
+ *	Dave Liu <daveliu@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/ppc_asm.h>
+#include <asm/asm-offsets.h>
+
+#define L2CTL_L2E	0x80000000
+#define L2CTL_L2I	0x40000000
+
+	.section .text
+
+#ifdef CONFIG_FSL_PMC
+	/* r3 = virtual address of L2 controller, WIMG = 01xx */
+_GLOBAL(flush_disable_L2)
+	/* It's a write-through cache, so only invalidation is needed. */
+	mbar
+	isync
+	lwz	r4, 0(r3)
+	li	r5, 1
+	rlwimi	r4, r5, 30, L2CTL_L2E | L2CTL_L2I
+	stw	r4, 0(r3)
+
+	/* Wait for the invalidate to finish */
+1:	lwz	r4, 0(r3)
+	andis.	r4, r4, L2CTL_L2I@h
+	bne	1b
+	mbar
+
+	blr
+
+	/* r3 = virtual address of L2 controller, WIMG = 01xx */
+_GLOBAL(invalidate_enable_L2)
+	mbar
+	isync
+	lwz	r4, 0(r3)
+	li	r5, 3
+	rlwimi	r4, r5, 30, L2CTL_L2E | L2CTL_L2I
+	stw	r4, 0(r3)
+
+	/* Wait for the invalidate to finish */
+1:	lwz	r4, 0(r3)
+	andis.	r4, r4, L2CTL_L2I@h
+	bne	1b
+	mbar
+
+	blr
+#endif
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 76f679c..8a030a1 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -4,6 +4,7 @@
 obj-$(CONFIG_SMP) += smp.o
 
 obj-y += common.o
+obj-$(CONFIG_FSL_PMC) += sleep.o
 
 obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
 obj-$(CONFIG_MPC8540_ADS) += mpc85xx_ads.o
diff --git a/arch/powerpc/platforms/85xx/sleep.S b/arch/powerpc/platforms/85xx/sleep.S
new file mode 100644
index 0000000..e6dfede
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/sleep.S
@@ -0,0 +1,621 @@
+/*
+ * Enter and leave deep sleep/sleep state on MPC85xx
+ *
+ * Author: Scott Wood <scottwood@freescale.com>
+ *
+ * Copyright (C) 2006-2012 Freescale Semiconductor, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <asm/page.h>
+#include <asm/ppc_asm.h>
+#include <asm/reg.h>
+#include <asm/asm-offsets.h>
+#include <asm/mmu.h>
+
+#define CCSR_ADDR		0xf0000000
+
+#define L2C_OFFSET		0x20000	/* L2 Cache Controller offset */
+
+#define BPTR_OFFSET		0x20 /* Boot Page Translation Register */
+#define BPTR_EN			0x80000000
+
+#define PMRCCR_OFFSET		0xe0084
+#define PMRCCR_VRCNT_PRE_MASK	0x1f000000
+#define PMRCCR_VRCNT_MASK	0x00ff0000
+
+#define POWMGTSCR_OFFSET	0xe0080
+#define POWMGTSCR_DPSLP		0x00100000 /* deep sleep mode */
+
+#define SS_TB		0x00
+#define SS_HID		0x08 /* 2 HIDs */
+#define SS_IAC		0x10 /* 2 IACs */
+#define SS_DAC		0x18 /* 2 DACs */
+#define SS_DBCR		0x20 /* 3 DBCRs */
+#define SS_PID		0x2c /* 3 PIDs */
+#define SS_SPRG		0x38 /* 8 SPRGs */
+#define SS_IVOR		0x58 /* 20 interrupt vectors */
+#define SS_TCR		0xa8
+#define SS_BUCSR	0xac
+#define SS_L1CSR	0xb0 /* 2 L1CSRs */
+#define SS_MSR		0xb8
+#define SS_USPRG	0xbc
+#define SS_GPREG	0xc0 /* r12-r31 */
+#define SS_LR		0x110
+#define SS_CR		0x114
+#define SS_SP		0x118
+#define SS_CURRENT	0x11c
+#define SS_IVPR		0x120
+#define SS_BPTR		0x124
+
+#define STATE_SAVE_SIZE 0x128
+
+	.section .data
+	.align	5
+mpc85xx_sleep_save_area:
+	.space	STATE_SAVE_SIZE
+ccsrbase_low:
+	.long	0
+ccsrbase_high:
+	.long	0
+powmgtreq:
+	.long	0
+
+	.section .text
+	.align	12
+
+	/*
+	 * r3 = high word of physical address of CCSR
+	 * r4 = low word of physical address of CCSR
+	 * r5 = JOG or deep sleep request
+	 *      JOG-0x00200000, deep sleep-0x00100000
+	 */
+_GLOBAL(mpc85xx_enter_deep_sleep)
+	lis	r6, ccsrbase_low@ha
+	stw	r4, ccsrbase_low@l(r6)
+	lis	r6, ccsrbase_high@ha
+	stw	r3, ccsrbase_high@l(r6)
+
+	lis	r6, powmgtreq@ha
+	stw	r5, powmgtreq@l(r6)
+
+	lis	r10, mpc85xx_sleep_save_area@h
+	ori	r10, r10, mpc85xx_sleep_save_area@l
+
+	mfspr	r5, SPRN_HID0
+	mfspr	r6, SPRN_HID1
+
+	stw	r5, SS_HID+0(r10)
+	stw	r6, SS_HID+4(r10)
+
+	mfspr	r4, SPRN_IAC1
+	mfspr	r5, SPRN_IAC2
+	mfspr	r6, SPRN_DAC1
+	mfspr	r7, SPRN_DAC2
+
+	stw	r4, SS_IAC+0(r10)
+	stw	r5, SS_IAC+4(r10)
+	stw	r6, SS_DAC+0(r10)
+	stw	r7, SS_DAC+4(r10)
+
+	mfspr	r4, SPRN_DBCR0
+	mfspr	r5, SPRN_DBCR1
+	mfspr	r6, SPRN_DBCR2
+
+	stw	r4, SS_DBCR+0(r10)
+	stw	r5, SS_DBCR+4(r10)
+	stw	r6, SS_DBCR+8(r10)
+
+	mfspr	r4, SPRN_PID0
+	mfspr	r5, SPRN_PID1
+	mfspr	r6, SPRN_PID2
+
+	stw	r4, SS_PID+0(r10)
+	stw	r5, SS_PID+4(r10)
+	stw	r6, SS_PID+8(r10)
+
+	mfspr	r4, SPRN_SPRG0
+	mfspr	r5, SPRN_SPRG1
+	mfspr	r6, SPRN_SPRG2
+	mfspr	r7, SPRN_SPRG3
+
+	stw	r4, SS_SPRG+0x00(r10)
+	stw	r5, SS_SPRG+0x04(r10)
+	stw	r6, SS_SPRG+0x08(r10)
+	stw	r7, SS_SPRG+0x0c(r10)
+
+	mfspr	r4, SPRN_SPRG4
+	mfspr	r5, SPRN_SPRG5
+	mfspr	r6, SPRN_SPRG6
+	mfspr	r7, SPRN_SPRG7
+
+	stw	r4, SS_SPRG+0x10(r10)
+	stw	r5, SS_SPRG+0x14(r10)
+	stw	r6, SS_SPRG+0x18(r10)
+	stw	r7, SS_SPRG+0x1c(r10)
+
+	mfspr	r4, SPRN_IVPR
+	stw	r4, SS_IVPR(r10)
+
+	mfspr	r4, SPRN_IVOR0
+	mfspr	r5, SPRN_IVOR1
+	mfspr	r6, SPRN_IVOR2
+	mfspr	r7, SPRN_IVOR3
+
+	stw	r4, SS_IVOR+0x00(r10)
+	stw	r5, SS_IVOR+0x04(r10)
+	stw	r6, SS_IVOR+0x08(r10)
+	stw	r7, SS_IVOR+0x0c(r10)
+
+	mfspr	r4, SPRN_IVOR4
+	mfspr	r5, SPRN_IVOR5
+	mfspr	r6, SPRN_IVOR6
+	mfspr	r7, SPRN_IVOR7
+
+	stw	r4, SS_IVOR+0x10(r10)
+	stw	r5, SS_IVOR+0x14(r10)
+	stw	r6, SS_IVOR+0x18(r10)
+	stw	r7, SS_IVOR+0x1c(r10)
+
+	mfspr	r4, SPRN_IVOR8
+	mfspr	r5, SPRN_IVOR9
+	mfspr	r6, SPRN_IVOR10
+	mfspr	r7, SPRN_IVOR11
+
+	stw	r4, SS_IVOR+0x20(r10)
+	stw	r5, SS_IVOR+0x24(r10)
+	stw	r6, SS_IVOR+0x28(r10)
+	stw	r7, SS_IVOR+0x2c(r10)
+
+	mfspr	r4, SPRN_IVOR12
+	mfspr	r5, SPRN_IVOR13
+	mfspr	r6, SPRN_IVOR14
+	mfspr	r7, SPRN_IVOR15
+
+	stw	r4, SS_IVOR+0x30(r10)
+	stw	r5, SS_IVOR+0x34(r10)
+	stw	r6, SS_IVOR+0x38(r10)
+	stw	r7, SS_IVOR+0x3c(r10)
+
+	mfspr	r4, SPRN_IVOR32
+	mfspr	r5, SPRN_IVOR33
+	mfspr	r6, SPRN_IVOR34
+	mfspr	r7, SPRN_IVOR35
+
+	stw	r4, SS_IVOR+0x40(r10)
+	stw	r5, SS_IVOR+0x44(r10)
+	stw	r6, SS_IVOR+0x48(r10)
+	stw	r7, SS_IVOR+0x4c(r10)
+
+	mfspr	r4, SPRN_TCR
+	mfspr	r5, SPRN_BUCSR
+	mfspr	r6, SPRN_L1CSR0
+	mfspr	r7, SPRN_L1CSR1
+	mfspr	r8, SPRN_USPRG0
+
+	stw	r4, SS_TCR(r10)
+	stw	r5, SS_BUCSR(r10)
+	stw	r6, SS_L1CSR+0(r10)
+	stw	r7, SS_L1CSR+4(r10)
+	stw	r8, SS_USPRG+0(r10)
+
+	stmw	r12, SS_GPREG(r10)
+
+	mfmsr	r4
+	mflr	r5
+	mfcr	r6
+
+	stw	r4, SS_MSR(r10)
+	stw	r5, SS_LR(r10)
+	stw	r6, SS_CR(r10)
+	stw	r1, SS_SP(r10)
+	stw	r2, SS_CURRENT(r10)
+
+1:	mftbu	r4
+	mftb	r5
+	mftbu	r6
+	cmpw	r4, r6
+	bne	1b
+
+	stw	r4, SS_TB+0(r10)
+	stw	r5, SS_TB+4(r10)
+
+	lis	r5, ccsrbase_low@ha
+	lwz	r4, ccsrbase_low@l(r5)
+	lis	r5, ccsrbase_high@ha
+	lwz	r3, ccsrbase_high@l(r5)
+
+	/* Disable machine checks and critical exceptions */
+	mfmsr	r5
+	rlwinm	r5, r5, 0, ~MSR_CE
+	rlwinm	r5, r5, 0, ~MSR_ME
+	mtmsr	r5
+	isync
+
+	/* Use TLB1[15] to map the CCSR at 0xf0000000 */
+	LOAD_REG_IMMEDIATE(r5, MAS0_TLBSEL(1) | MAS0_ESEL(15))
+	mtspr	SPRN_MAS0, r5
+	LOAD_REG_IMMEDIATE(r5,
+		MAS1_VALID | MAS1_IPROT | MAS1_TSIZE(BOOK3E_PAGESZ_1M))
+	mtspr	SPRN_MAS1, r5
+	LOAD_REG_IMMEDIATE(r5, CCSR_ADDR | MAS2_I | MAS2_M)
+	mtspr	SPRN_MAS2, r5
+	rlwinm	r5, r4, 0, MAS3_RPN
+	ori	r5, r5, (MAS3_SW | MAS3_SR)
+	mtspr	SPRN_MAS3, r5
+	mtspr	SPRN_MAS7, r3
+	isync
+	tlbwe
+	isync
+
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + BPTR_OFFSET)
+	lwz	r4, 0(r3)
+	stw	r4, SS_BPTR(r10)
+
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + L2C_OFFSET)
+	bl	flush_disable_L2
+	bl	__flush_disable_L1
+
+	/* Enable I-cache, so as not to upset the bus
+	 * with our loop.
+	 */
+	mfspr	r4, SPRN_L1CSR1
+	ori	r4, r4, L1CSR1_ICE
+	mtspr	SPRN_L1CSR1, r4
+	isync
+
+	/* Set boot page translation */
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + BPTR_OFFSET)
+	lis	r4, (mpc85xx_deep_resume - PAGE_OFFSET)@h
+	ori	r4, r4, (mpc85xx_deep_resume - PAGE_OFFSET)@l
+	rlwinm	r4, r4, 20, 12, 31
+	oris	r4, r4, BPTR_EN@h
+	stw	r4, 0(r3)
+	lwz	r4, 0(r3) /* read-back to flush write */
+	twi	0, r4, 0
+	isync
+
+	/* Disable the decrementer */
+	mfspr	r4, SPRN_TCR
+	rlwinm	r4, r4, 0, ~TCR_DIE
+	mtspr	SPRN_TCR, r4
+
+	mfspr	r4, SPRN_TSR
+	oris	r4, r4, TSR_DIS@h
+	mtspr	SPRN_TSR, r4
+
+	/* set PMRCCR[VRCNT] to wait power stable for 40ms */
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + PMRCCR_OFFSET)
+	lwz	r4, 0(r3)
+	li	r5, 0x12
+	rlwimi	r4, r5, 0, PMRCCR_VRCNT_PRE_MASK
+	li	r5, 0xa3
+	rlwimi	r4, r5, 0, PMRCCR_VRCNT_MASK
+	stw	r4, 0(r3)
+	lwz	r4, 0(r3)
+
+	/* set deep sleep bit in POWMGTSCR */
+	lis	r3, powmgtreq@ha
+	lwz	r8, powmgtreq@l(r3)
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + POWMGTSCR_OFFSET)
+	lwz	r4, 0(r3)
+	or	r4, r4, r8
+	stw	r4, 0(r3)
+	lwz	r4, 0(r3)		/* read-back to flush write */
+	twi	0, r4, 0
+	isync
+
+	mftb	r5
+1:	/* spin until either we enter deep sleep, or the sleep process is
+	 * aborted due to a pending wakeup event.  Wait some time between
+	 * accesses, so we don't flood the bus and prevent the pmc from
+	 * detecting an idle system.
+	 */
+
+	mftb	r4
+	subf	r7, r5, r4
+	cmpwi	r7, 1000
+	blt	1b
+	mr	r5, r4
+
+	lwz	r6, 0(r3)
+	andis.	r6, r6, POWMGTSCR_DPSLP@h
+	bne	1b
+	b	2f
+
+2:	mfspr	r4, SPRN_PIR
+	andi.	r4, r4, 1
+99:	bne	99b
+
+	/* Establish a temporary 64MB 0->0 mapping in TLB1[1]. */
+	LOAD_REG_IMMEDIATE(r4, MAS0_TLBSEL(1) | MAS0_ESEL(1))
+	mtspr	SPRN_MAS0, r4
+	LOAD_REG_IMMEDIATE(r4,
+		MAS1_VALID | MAS1_IPROT | MAS1_TSIZE(BOOK3E_PAGESZ_64M))
+	mtspr	SPRN_MAS1, r4
+	li	r4, 0
+	mtspr	SPRN_MAS2, r4
+	li	r4, (MAS3_SX | MAS3_SW | MAS3_SR)
+	mtspr	SPRN_MAS3, r4
+	li	r4, 0
+	mtspr	SPRN_MAS7, r4
+	isync
+	tlbwe
+	isync
+
+	lis	r3, (3f - PAGE_OFFSET)@h
+	ori	r3, r3, (3f - PAGE_OFFSET)@l
+	mtctr	r3
+	bctr
+
+	/* Locate the resume vector in the last word of the current page. */
+	. = mpc85xx_enter_deep_sleep + 0xffc
+mpc85xx_deep_resume:
+	b	2b
+
+3:
+	/* Restore the contents of TLB1[0].  It is assumed that it covers
+	 * the currently executing code and the sleep save area, and that
+	 * it does not alias our temporary mapping (which is at virtual zero).
+	 */
+	lis	r3, (TLBCAM - PAGE_OFFSET)@h
+	ori	r3, r3, (TLBCAM - PAGE_OFFSET)@l
+
+	lwz	r4, 0(r3)
+	lwz	r5, 4(r3)
+	lwz	r6, 8(r3)
+	lwz	r7, 12(r3)
+	lwz	r8, 16(r3)
+
+	mtspr	SPRN_MAS0, r4
+	mtspr	SPRN_MAS1, r5
+	mtspr	SPRN_MAS2, r6
+	mtspr	SPRN_MAS3, r7
+	mtspr	SPRN_MAS7, r8
+
+	isync
+	tlbwe
+	isync
+
+	/* Access the ccsrbase address with TLB1[0] */
+	lis	r5, ccsrbase_low@ha
+	lwz	r4, ccsrbase_low@l(r5)
+	lis	r5, ccsrbase_high@ha
+	lwz	r3, ccsrbase_high@l(r5)
+
+	/* Use TLB1[15] to map the CCSR at 0xf0000000 */
+	LOAD_REG_IMMEDIATE(r5, MAS0_TLBSEL(1) | MAS0_ESEL(15))
+	mtspr	SPRN_MAS0, r5
+	LOAD_REG_IMMEDIATE(r5,
+		MAS1_VALID | MAS1_IPROT | MAS1_TSIZE(BOOK3E_PAGESZ_1M))
+	mtspr	SPRN_MAS1, r5
+	LOAD_REG_IMMEDIATE(r5, CCSR_ADDR | MAS2_I | MAS2_M)
+	mtspr	SPRN_MAS2, r5
+	rlwinm	r5, r4, 0, MAS3_RPN
+	ori	r5, r5, (MAS3_SW | MAS3_SR)
+	mtspr	SPRN_MAS3, r5
+	mtspr	SPRN_MAS7, r3
+	isync
+	tlbwe
+	isync
+
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + L2C_OFFSET)
+	bl	invalidate_enable_L2
+
+	/* Access the MEM(r10) with TLB1[0] */
+	lis	r10, mpc85xx_sleep_save_area@h
+	ori	r10, r10, mpc85xx_sleep_save_area@l
+
+	LOAD_REG_IMMEDIATE(r3, CCSR_ADDR + BPTR_OFFSET)
+	lwz	r4, SS_BPTR(r10)
+	stw	r4, 0(r3)		/* restore BPTR */
+
+	/* Program shift running space to PAGE_OFFSET */
+	mfmsr	r3
+	lis	r4, 1f@h
+	ori	r4, r4, 1f@l
+
+	mtsrr1	r3
+	mtsrr0	r4
+	rfi
+
+1:	/* Restore the rest of TLB1, in ascending order so that
+	 * the TLB1[1] gets invalidated first.
+	 *
+	 * XXX: It's better to invalidate the temporary mapping
+	 * TLB1[15] for CCSR before restore any TLB1 entry include 0.
+	 */
+	LOAD_REG_IMMEDIATE(r4, MAS0_TLBSEL(1) | MAS0_ESEL(15))
+	mtspr	SPRN_MAS0, r4
+	lis	r4, 0
+	mtspr	SPRN_MAS1, r4
+	isync
+	tlbwe
+	isync
+
+	lis	r3, (TLBCAM + 5*4 - 4)@h
+	ori	r3, r3, (TLBCAM + 5*4 - 4)@l
+	li	r4, 15
+	mtctr	r4
+
+2:
+	lwz	r5, 4(r3)
+	lwz	r6, 8(r3)
+	lwz	r7, 12(r3)
+	lwz	r8, 16(r3)
+	lwzu	r9, 20(r3)
+
+	mtspr	SPRN_MAS0, r5
+	mtspr	SPRN_MAS1, r6
+	mtspr	SPRN_MAS2, r7
+	mtspr	SPRN_MAS3, r8
+	mtspr	SPRN_MAS7, r9
+
+	isync
+	tlbwe
+	isync
+	bdnz	2b
+
+	lis	r10, mpc85xx_sleep_save_area@h
+	ori	r10, r10, mpc85xx_sleep_save_area@l
+
+	lwz	r5, SS_HID+0(r10)
+	lwz	r6, SS_HID+4(r10)
+
+	isync
+	mtspr	SPRN_HID0, r5
+	isync
+
+	msync
+	mtspr	SPRN_HID1, r6
+	isync
+
+	lwz	r4, SS_IAC+0(r10)
+	lwz	r5, SS_IAC+4(r10)
+	lwz	r6, SS_DAC+0(r10)
+	lwz	r7, SS_DAC+4(r10)
+
+	mtspr	SPRN_IAC1, r4
+	mtspr	SPRN_IAC2, r5
+	mtspr	SPRN_DAC1, r6
+	mtspr	SPRN_DAC2, r7
+
+	lwz	r4, SS_DBCR+0(r10)
+	lwz	r5, SS_DBCR+4(r10)
+	lwz	r6, SS_DBCR+8(r10)
+
+	mtspr	SPRN_DBCR0, r4
+	mtspr	SPRN_DBCR1, r5
+	mtspr	SPRN_DBCR2, r6
+
+	lwz	r4, SS_PID+0(r10)
+	lwz	r5, SS_PID+4(r10)
+	lwz	r6, SS_PID+8(r10)
+
+	mtspr	SPRN_PID0, r4
+	mtspr	SPRN_PID1, r5
+	mtspr	SPRN_PID2, r6
+
+	lwz	r4, SS_SPRG+0x00(r10)
+	lwz	r5, SS_SPRG+0x04(r10)
+	lwz	r6, SS_SPRG+0x08(r10)
+	lwz	r7, SS_SPRG+0x0c(r10)
+
+	mtspr	SPRN_SPRG0, r4
+	mtspr	SPRN_SPRG1, r5
+	mtspr	SPRN_SPRG2, r6
+	mtspr	SPRN_SPRG3, r7
+
+	lwz	r4, SS_SPRG+0x10(r10)
+	lwz	r5, SS_SPRG+0x14(r10)
+	lwz	r6, SS_SPRG+0x18(r10)
+	lwz	r7, SS_SPRG+0x1c(r10)
+
+	mtspr	SPRN_SPRG4, r4
+	mtspr	SPRN_SPRG5, r5
+	mtspr	SPRN_SPRG6, r6
+	mtspr	SPRN_SPRG7, r7
+
+	lwz	r4, SS_IVPR(r10)
+	mtspr	SPRN_IVPR, r4
+
+	lwz	r4, SS_IVOR+0x00(r10)
+	lwz	r5, SS_IVOR+0x04(r10)
+	lwz	r6, SS_IVOR+0x08(r10)
+	lwz	r7, SS_IVOR+0x0c(r10)
+
+	mtspr	SPRN_IVOR0, r4
+	mtspr	SPRN_IVOR1, r5
+	mtspr	SPRN_IVOR2, r6
+	mtspr	SPRN_IVOR3, r7
+
+	lwz	r4, SS_IVOR+0x10(r10)
+	lwz	r5, SS_IVOR+0x14(r10)
+	lwz	r6, SS_IVOR+0x18(r10)
+	lwz	r7, SS_IVOR+0x1c(r10)
+
+	mtspr	SPRN_IVOR4, r4
+	mtspr	SPRN_IVOR5, r5
+	mtspr	SPRN_IVOR6, r6
+	mtspr	SPRN_IVOR7, r7
+
+	lwz	r4, SS_IVOR+0x20(r10)
+	lwz	r5, SS_IVOR+0x24(r10)
+	lwz	r6, SS_IVOR+0x28(r10)
+	lwz	r7, SS_IVOR+0x2c(r10)
+
+	mtspr	SPRN_IVOR8, r4
+	mtspr	SPRN_IVOR9, r5
+	mtspr	SPRN_IVOR10, r6
+	mtspr	SPRN_IVOR11, r7
+
+	lwz	r4, SS_IVOR+0x30(r10)
+	lwz	r5, SS_IVOR+0x34(r10)
+	lwz	r6, SS_IVOR+0x38(r10)
+	lwz	r7, SS_IVOR+0x3c(r10)
+
+	mtspr	SPRN_IVOR12, r4
+	mtspr	SPRN_IVOR13, r5
+	mtspr	SPRN_IVOR14, r6
+	mtspr	SPRN_IVOR15, r7
+
+	lwz	r4, SS_IVOR+0x40(r10)
+	lwz	r5, SS_IVOR+0x44(r10)
+	lwz	r6, SS_IVOR+0x48(r10)
+	lwz	r7, SS_IVOR+0x4c(r10)
+
+	mtspr	SPRN_IVOR32, r4
+	mtspr	SPRN_IVOR33, r5
+	mtspr	SPRN_IVOR34, r6
+	mtspr	SPRN_IVOR35, r7
+
+	lwz	r4, SS_TCR(r10)
+	lwz	r5, SS_BUCSR(r10)
+	lwz	r6, SS_L1CSR+0(r10)
+	lwz	r7, SS_L1CSR+4(r10)
+	lwz	r8, SS_USPRG+0(r10)
+
+	mtspr	SPRN_TCR, r4
+	mtspr	SPRN_BUCSR, r5
+
+	msync
+	isync
+	mtspr	SPRN_L1CSR0, r6
+	isync
+
+	mtspr	SPRN_L1CSR1, r7
+	isync
+
+	mtspr	SPRN_USPRG0, r8
+
+	lmw	r12, SS_GPREG(r10)
+
+	lwz	r1, SS_SP(r10)
+	lwz	r2, SS_CURRENT(r10)
+	lwz	r4, SS_MSR(r10)
+	lwz	r5, SS_LR(r10)
+	lwz	r6, SS_CR(r10)
+
+	msync
+	mtmsr	r4
+	isync
+
+	mtlr	r5
+	mtcr	r6
+
+	li	r4, 0
+	mtspr	SPRN_TBWL, r4
+
+	lwz	r4, SS_TB+0(r10)
+	lwz	r5, SS_TB+4(r10)
+
+	mtspr	SPRN_TBWU, r4
+	mtspr	SPRN_TBWL, r5
+
+	lis	r3, 1
+	mtdec	r3
+
+	blr
diff --git a/arch/powerpc/sysdev/fsl_pmc.c b/arch/powerpc/sysdev/fsl_pmc.c
index 592a0f8..45718c5 100644
--- a/arch/powerpc/sysdev/fsl_pmc.c
+++ b/arch/powerpc/sysdev/fsl_pmc.c
@@ -2,6 +2,7 @@
  * Suspend/resume support
  *
  * Copyright 2009  MontaVista Software, Inc.
+ * Copyright 2010-2012 Freescale Semiconductor Inc.
  *
  * Author: Anton Vorontsov <avorontsov@ru.mvista.com>
  *
@@ -19,39 +20,89 @@
 #include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/of_platform.h>
+#include <linux/pm.h>
+#include <asm/cacheflush.h>
+#include <asm/switch_to.h>
+
+#include <sysdev/fsl_soc.h>
 
 struct pmc_regs {
+	/* 0xe0070: Device disable control register */
 	__be32 devdisr;
+	/* 0xe0074: 2nd Device disable control register */
 	__be32 devdisr2;
-	__be32 :32;
-	__be32 :32;
-	__be32 pmcsr;
-#define PMCSR_SLP	(1 << 17)
+	__be32 res1;
+	/* 0xe007c: Power Management Jog Control Register */
+	__be32 pmjcr;
+	/* 0xe0080: Power management control and status register */
+	__be32 powmgtcsr;
+#define POWMGTCSR_SLP		0x00020000
+#define POWMGTCSR_DPSLP		0x00100000
+	__be32 res3[2];
+	/* 0xe008c: Power management clock disable register */
+	__be32 pmcdr;
 };
 
-static struct device *pmc_dev;
 static struct pmc_regs __iomem *pmc_regs;
+static unsigned int pmc_flag;
+
+#define PMC_SLEEP	0x1
+#define PMC_DEEP_SLEEP	0x2
 
 static int pmc_suspend_enter(suspend_state_t state)
 {
-	int ret;
+	int ret = 0;
+
+	switch (state) {
+#ifdef CONFIG_PPC_85xx
+	case PM_SUSPEND_MEM:
+#ifdef CONFIG_SPE
+		enable_kernel_spe();
+#endif
+		enable_kernel_fp();
+
+		pr_debug("%s: Entering deep sleep\n", __func__);
+
+		local_irq_disable();
+		mpc85xx_enter_deep_sleep(get_immrbase(), POWMGTCSR_DPSLP);
+
+		pr_debug("%s: Resumed from deep sleep\n", __func__);
+		break;
+#endif
 
-	setbits32(&pmc_regs->pmcsr, PMCSR_SLP);
-	/* At this point, the CPU is asleep. */
+	case PM_SUSPEND_STANDBY:
+		local_irq_disable();
+#ifdef CONFIG_PPC_85xx
+		flush_dcache_L1();
+#endif
+		setbits32(&pmc_regs->powmgtcsr, POWMGTCSR_SLP);
+		/* At this point, the CPU is asleep. */
 
-	/* Upon resume, wait for SLP bit to be clear. */
-	ret = spin_event_timeout((in_be32(&pmc_regs->pmcsr) & PMCSR_SLP) == 0,
-				 10000, 10) ? 0 : -ETIMEDOUT;
-	if (ret)
-		dev_err(pmc_dev, "tired waiting for SLP bit to clear\n");
+		/* Upon resume, wait for SLP bit to be clear. */
+		ret = spin_event_timeout(
+			(in_be32(&pmc_regs->powmgtcsr) & POWMGTCSR_SLP) == 0,
+			10000, 10);
+		if (!ret) {
+			pr_err("%s: timeout waiting for SLP bit "
+				"to be cleared\n", __func__);
+			ret = -EINVAL;
+		}
+		break;
+
+	default:
+		ret = -EINVAL;
+
+	}
 	return ret;
 }
 
 static int pmc_suspend_valid(suspend_state_t state)
 {
-	if (state != PM_SUSPEND_STANDBY)
+	if (((pmc_flag & PMC_SLEEP) && (state == PM_SUSPEND_STANDBY)) ||
+	    ((pmc_flag & PMC_DEEP_SLEEP) && (state == PM_SUSPEND_MEM)))
+		return 1;
+	else
 		return 0;
-	return 1;
 }
 
 static const struct platform_suspend_ops pmc_suspend_ops = {
@@ -59,14 +110,25 @@ static const struct platform_suspend_ops pmc_suspend_ops = {
 	.enter = pmc_suspend_enter,
 };
 
-static int pmc_probe(struct platform_device *ofdev)
+static int pmc_probe(struct platform_device *pdev)
 {
-	pmc_regs = of_iomap(ofdev->dev.of_node, 0);
+	struct device_node *np = pdev->dev.of_node;
+
+	pmc_regs = of_iomap(np, 0);
 	if (!pmc_regs)
 		return -ENOMEM;
 
-	pmc_dev = &ofdev->dev;
+	pmc_flag = PMC_SLEEP;
+	if (of_device_is_compatible(np, "fsl,mpc8536-pmc"))
+		pmc_flag |= PMC_DEEP_SLEEP;
+
+	if (of_device_is_compatible(np, "fsl,p1022-pmc"))
+		pmc_flag |= PMC_DEEP_SLEEP;
+
 	suspend_set_ops(&pmc_suspend_ops);
+
+	pr_info("Freescale PMC driver: sleep(standby)%s\n",
+		(pmc_flag & PMC_DEEP_SLEEP) ? ", deep sleep(mem)" : "");
 	return 0;
 }
 
diff --git a/arch/powerpc/sysdev/fsl_soc.h b/arch/powerpc/sysdev/fsl_soc.h
index c6d0073..11d9f94 100644
--- a/arch/powerpc/sysdev/fsl_soc.h
+++ b/arch/powerpc/sysdev/fsl_soc.h
@@ -48,5 +48,10 @@ extern struct platform_diu_data_ops diu_ops;
 void fsl_hv_restart(char *cmd);
 void fsl_hv_halt(void);
 
+/*
+ * ccsrbar is u64 rather than phys_addr_t so that the assembly
+ * code can be compatible with both 32-bit & 36-bit.
+ */
+extern void mpc85xx_enter_deep_sleep(u64 ccsrbar, u32 powmgtreq);
 #endif
 #endif
-- 
1.6.4.1

^ permalink raw reply related

* [PATCH 3/4] cpu: export cpu hotplug disable/enable functions as global functions
From: Zhao Chenhui @ 2012-08-07  8:43 UTC (permalink / raw)
  To: linuxppc-dev, galak; +Cc: linux-kernel
In-Reply-To: <1344329006-10645-1-git-send-email-chenhui.zhao@freescale.com>

The cpufreq driver of mpc85xx will disable/enable cpu hotplug temporarily.
Therefore, the related functions should be exported.

Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
---
 include/linux/cpu.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..df8f73d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -146,6 +146,8 @@ void notify_cpu_starting(unsigned int cpu);
 extern void cpu_maps_update_begin(void);
 extern void cpu_maps_update_done(void);
 
+extern void cpu_hotplug_disable_before_freeze(void);
+extern void cpu_hotplug_enable_after_thaw(void);
 #else	/* CONFIG_SMP */
 
 #define cpu_notifier(fn, pri)	do { (void)(fn); } while (0)
@@ -167,6 +169,8 @@ static inline void cpu_maps_update_done(void)
 {
 }
 
+static inline void cpu_hotplug_disable_before_freeze(void)	{}
+static inline void cpu_hotplug_enable_after_thaw(void)	{}
 #endif /* CONFIG_SMP */
 extern struct bus_type cpu_subsys;
 
-- 
1.6.4.1

^ permalink raw reply related

* [PATCH 4/4] powerpc/85xx: add support to JOG feature using cpufreq interface
From: Zhao Chenhui @ 2012-08-07  8:43 UTC (permalink / raw)
  To: linuxppc-dev, galak; +Cc: linux-kernel
In-Reply-To: <1344329006-10645-1-git-send-email-chenhui.zhao@freescale.com>

Some 85xx silicons like MPC8536 and P1022 have a JOG feature, which provides
a dynamic mechanism to lower or raise the CPU core clock at runtime.

This patch adds the support to change CPU frequency using the standard
cpufreq interface. The ratio CORE to CCB can be 1:1(except MPC8536), 3:2,
2:1, 5:2, 3:1, 7:2 and 4:1.

Two CPU cores on P1022 must not in the low power state during the frequency
transition. The driver uses a atomic counter to meet the requirement.

The jog mode frequency transition process on the MPC8536 is similar to
the deep sleep process. The driver need save the CPU state and restore
it after CPU warm reset.

Note:
 * The I/O peripherals such as PCIe and eTSEC may lose packets during
   the jog mode frequency transition.
 * The driver doesn't support MPC8536 Rev 1.0 due to a JOG erratum.
   Subsequent revisions of MPC8536 have corrected the erratum.

Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
CC: Scott Wood <scottwood@freescale.com>
---
 arch/powerpc/platforms/85xx/Makefile      |    1 +
 arch/powerpc/platforms/85xx/cpufreq-jog.c |  388 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/Kconfig            |   11 +
 arch/powerpc/sysdev/fsl_pmc.c             |    3 +
 arch/powerpc/sysdev/fsl_soc.h             |    2 +
 5 files changed, 405 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/85xx/cpufreq-jog.c

diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 8a030a1..6156849 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_SMP) += smp.o
 
 obj-y += common.o
 obj-$(CONFIG_FSL_PMC) += sleep.o
+obj-$(CONFIG_MPC85xx_CPUFREQ) += cpufreq-jog.o
 
 obj-$(CONFIG_BSC9131_RDB) += bsc913x_rdb.o
 obj-$(CONFIG_MPC8540_ADS) += mpc85xx_ads.o
diff --git a/arch/powerpc/platforms/85xx/cpufreq-jog.c b/arch/powerpc/platforms/85xx/cpufreq-jog.c
new file mode 100644
index 0000000..ccc0c33
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/cpufreq-jog.c
@@ -0,0 +1,388 @@
+/*
+ * Copyright (C) 2008-2012 Freescale Semiconductor, Inc.
+ * Author: Dave Liu <daveliu@freescale.com>
+ * Modifier: Chenhui Zhao <chenhui.zhao@freescale.com>
+ *
+ * The cpufreq driver is for Freescale 85xx processor,
+ * based on arch/powerpc/platforms/cell/cbe_cpufreq.c
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005-2007
+ *	Christian Krafft <krafft@de.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/cpufreq.h>
+#include <linux/of_platform.h>
+#include <linux/suspend.h>
+#include <linux/cpu.h>
+
+#include <asm/prom.h>
+#include <asm/time.h>
+#include <asm/reg.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+#include <asm/smp.h>
+
+#include <sysdev/fsl_soc.h>
+
+static DEFINE_MUTEX(mpc85xx_switch_mutex);
+static void __iomem *guts;
+
+static u32 sysfreq;
+static unsigned int max_pll[2];
+static atomic_t in_jog_process;
+static struct cpufreq_frequency_table *mpc85xx_freqs;
+static int (*set_pll)(unsigned int cpu, unsigned int pll);
+
+static struct cpufreq_frequency_table mpc8536_freqs_table[] = {
+	{3,	0},
+	{4,	0},
+	{5,	0},
+	{6,	0},
+	{7,	0},
+	{8,	0},
+	{0,	CPUFREQ_TABLE_END},
+};
+
+static struct cpufreq_frequency_table p1022_freqs_table[] = {
+	{2,	0},
+	{3,	0},
+	{4,	0},
+	{5,	0},
+	{6,	0},
+	{7,	0},
+	{8,	0},
+	{0,	CPUFREQ_TABLE_END},
+};
+
+#define FREQ_500MHz	500000000
+#define FREQ_800MHz	800000000
+
+#define CORE_RATIO_STRIDE	8
+#define CORE_RATIO_MASK		0x3f
+#define CORE_RATIO_SHIFT	16
+
+#define PORPLLSR	0x0	/* Power-On Reset PLL ratio status register */
+
+#define PMJCR		0x7c	/* Power Management Jog Control Register */
+#define PMJCR_CORE0_SPD	0x00001000
+#define PMJCR_CORE_SPD	0x00002000
+
+#define POWMGTCSR	0x80 /* Power management control and status register */
+#define POWMGTCSR_JOG		0x00200000
+#define POWMGTCSR_INT_MASK	0x00000f00
+
+static void spin_while_jogging(void *dummy)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	atomic_inc(&in_jog_process);
+
+	while (atomic_read(&in_jog_process) != 0)
+		barrier();
+
+	local_irq_restore(flags);
+}
+
+static int get_pll(int hw_cpu)
+{
+	int shift;
+	u32 val = in_be32(guts + PORPLLSR);
+
+	shift = hw_cpu * CORE_RATIO_STRIDE + CORE_RATIO_SHIFT;
+
+	return (val >> shift) & CORE_RATIO_MASK;
+}
+
+static int mpc8536_set_pll(unsigned int cpu, unsigned int pll)
+{
+	u32 corefreq, val, mask;
+	unsigned int cur_pll = get_pll(0);
+	unsigned long flags;
+
+	if (pll == cur_pll)
+		return 0;
+
+	val = (pll & CORE_RATIO_MASK) << CORE_RATIO_SHIFT;
+
+	corefreq = sysfreq * pll / 2;
+	/*
+	 * Set the COREx_SPD bit if the requested core frequency
+	 * is larger than the threshold frequency.
+	 */
+	if (corefreq > FREQ_800MHz)
+			val |= PMJCR_CORE_SPD;
+
+	mask = (CORE_RATIO_MASK << CORE_RATIO_SHIFT) | PMJCR_CORE_SPD;
+	clrsetbits_be32(guts + PMJCR, mask, val);
+
+	/* readback to sync write */
+	in_be32(guts + PMJCR);
+
+	local_irq_save(flags);
+	mpc85xx_enter_deep_sleep(get_immrbase(), POWMGTCSR_JOG);
+	local_irq_restore(flags);
+
+	/* verify */
+	cur_pll =  get_pll(0);
+	if (cur_pll != pll) {
+		pr_err("%s: error. The current PLL is %d instead of %d.\n",
+				__func__, cur_pll, pll);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int p1022_set_pll(unsigned int cpu, unsigned int pll)
+{
+	int index, hw_cpu = get_hard_smp_processor_id(cpu);
+	int shift;
+	u32 corefreq, val, mask = 0;
+	unsigned int cur_pll = get_pll(hw_cpu);
+	unsigned long flags;
+	int ret = 0;
+
+	if (pll == cur_pll)
+		return 0;
+
+	shift = hw_cpu * CORE_RATIO_STRIDE + CORE_RATIO_SHIFT;
+	val = (pll & CORE_RATIO_MASK) << shift;
+
+	corefreq = sysfreq * pll / 2;
+	/*
+	 * Set the COREx_SPD bit if the requested core frequency
+	 * is larger than the threshold frequency.
+	 */
+	if (corefreq > FREQ_500MHz)
+		val |= PMJCR_CORE0_SPD << hw_cpu;
+
+	mask = (CORE_RATIO_MASK << shift) | (PMJCR_CORE0_SPD << hw_cpu);
+	clrsetbits_be32(guts + PMJCR, mask, val);
+
+	/* readback to sync write */
+	in_be32(guts + PMJCR);
+
+	cpu_hotplug_disable_before_freeze();
+	/*
+	 * A Jog request can not be asserted when any core is in a low
+	 * power state on P1022. Before executing a jog request, any
+	 * core which is in a low power state must be waked by a
+	 * interrupt, and keep waking up until the sequence is
+	 * finished.
+	 */
+	for_each_present_cpu(index) {
+		if (!cpu_online(index)) {
+			cpu_hotplug_enable_after_thaw();
+			pr_err("%s: error, core%d is down.\n", __func__, index);
+			return -1;
+		}
+	}
+
+	atomic_set(&in_jog_process, 0);
+	smp_call_function(spin_while_jogging, NULL, 0);
+
+	local_irq_save(flags);
+
+	/* Wait for the other core to wake. */
+	if (!spin_event_timeout(atomic_read(&in_jog_process) == 1, 1000, 100)) {
+		pr_err("%s: timeout, the other core is not at running state.\n",
+					__func__);
+		ret = -1;
+		goto err;
+	}
+
+	out_be32(guts + POWMGTCSR, POWMGTCSR_JOG | POWMGTCSR_INT_MASK);
+
+	if (!spin_event_timeout(
+		(in_be32(guts + POWMGTCSR) & POWMGTCSR_JOG) == 0, 1000, 100)) {
+		pr_err("%s: timeout, fail to switch the core frequency.\n",
+				__func__);
+		ret = -1;
+		goto err;
+	}
+
+	clrbits32(guts + POWMGTCSR, POWMGTCSR_INT_MASK);
+	in_be32(guts + POWMGTCSR);
+
+	atomic_set(&in_jog_process, 0);
+err:
+	local_irq_restore(flags);
+	cpu_hotplug_enable_after_thaw();
+
+	/* verify */
+	cur_pll =  get_pll(hw_cpu);
+	if (cur_pll != pll) {
+		pr_err("%s: error, the current PLL of core %d is %d instead of %d.\n",
+				__func__, hw_cpu, cur_pll, pll);
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * cpufreq functions
+ */
+static int mpc85xx_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+	unsigned int i, cur_pll;
+	int hw_cpu = get_hard_smp_processor_id(policy->cpu);
+
+	if (!cpu_present(policy->cpu))
+		return -ENODEV;
+
+	/* the latency of a transition, the unit is ns */
+	policy->cpuinfo.transition_latency = 2000;
+
+	cur_pll = get_pll(hw_cpu);
+
+	/* initialize frequency table */
+	pr_debug("core%d frequency table:\n", hw_cpu);
+	for (i = 0; mpc85xx_freqs[i].frequency != CPUFREQ_TABLE_END; i++) {
+		if (mpc85xx_freqs[i].index <= max_pll[hw_cpu]) {
+			/* The frequency unit is kHz. */
+			mpc85xx_freqs[i].frequency =
+				(sysfreq * mpc85xx_freqs[i].index / 2) / 1000;
+		} else {
+			mpc85xx_freqs[i].frequency = CPUFREQ_ENTRY_INVALID;
+		}
+
+		pr_debug("%d: %dkHz\n", i, mpc85xx_freqs[i].frequency);
+
+		if (mpc85xx_freqs[i].index == cur_pll)
+			policy->cur = mpc85xx_freqs[i].frequency;
+	}
+	pr_debug("current pll is at %d, and core freq is%d\n",
+			cur_pll, policy->cur);
+
+	cpufreq_frequency_table_get_attr(mpc85xx_freqs, policy->cpu);
+
+	/*
+	 * This ensures that policy->cpuinfo_min
+	 * and policy->cpuinfo_max are set correctly.
+	 */
+	return cpufreq_frequency_table_cpuinfo(policy, mpc85xx_freqs);
+}
+
+static int mpc85xx_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	cpufreq_frequency_table_put_attr(policy->cpu);
+
+	return 0;
+}
+
+static int mpc85xx_cpufreq_verify(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, mpc85xx_freqs);
+}
+
+static int mpc85xx_cpufreq_target(struct cpufreq_policy *policy,
+			      unsigned int target_freq,
+			      unsigned int relation)
+{
+	struct cpufreq_freqs freqs;
+	unsigned int new;
+	int ret = 0;
+
+	if (!set_pll)
+		return -ENODEV;
+
+	cpufreq_frequency_table_target(policy,
+				       mpc85xx_freqs,
+				       target_freq,
+				       relation,
+				       &new);
+
+	freqs.old = policy->cur;
+	freqs.new = mpc85xx_freqs[new].frequency;
+	freqs.cpu = policy->cpu;
+
+	mutex_lock(&mpc85xx_switch_mutex);
+	cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+	ret = set_pll(policy->cpu, mpc85xx_freqs[new].index);
+	if (!ret) {
+		pr_info("cpufreq: Setting core%d frequency to %d kHz and PLL ratio to %d:2\n",
+			 policy->cpu, mpc85xx_freqs[new].frequency,
+			 mpc85xx_freqs[new].index);
+
+		ppc_proc_freq = freqs.new * 1000ul;
+	}
+	cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+	mutex_unlock(&mpc85xx_switch_mutex);
+
+	return ret;
+}
+
+static struct cpufreq_driver mpc85xx_cpufreq_driver = {
+	.verify		= mpc85xx_cpufreq_verify,
+	.target		= mpc85xx_cpufreq_target,
+	.init		= mpc85xx_cpufreq_cpu_init,
+	.exit		= mpc85xx_cpufreq_cpu_exit,
+	.name		= "mpc85xx-JOG",
+	.owner		= THIS_MODULE,
+	.flags		= CPUFREQ_CONST_LOOPS,
+};
+
+static struct of_device_id mpc85xx_jog_ids[] = {
+	{ .compatible = "fsl,mpc8536-guts", },
+	{ .compatible = "fsl,p1022-guts", },
+	{}
+};
+
+int mpc85xx_jog_probe(void)
+{
+	struct device_node *np;
+	unsigned int svr;
+
+	np = of_find_matching_node(NULL, mpc85xx_jog_ids);
+	if (!np)
+		return -ENODEV;
+
+	guts = of_iomap(np, 0);
+	if (!guts) {
+		of_node_put(np);
+		return -ENODEV;
+	}
+
+	sysfreq = fsl_get_sys_freq();
+
+	if (of_device_is_compatible(np, "fsl,mpc8536-guts")) {
+		svr = mfspr(SPRN_SVR);
+		if ((svr & 0x7fff) == 0x10) {
+			pr_err("MPC8536 Rev 1.0 does not support cpufreq(JOG).\n");
+			of_node_put(np);
+			return -ENODEV;
+		}
+		mpc85xx_freqs = mpc8536_freqs_table;
+		set_pll = mpc8536_set_pll;
+		max_pll[0] = get_pll(0);
+
+	} else if (of_device_is_compatible(np, "fsl,p1022-guts")) {
+		mpc85xx_freqs = p1022_freqs_table;
+		set_pll = p1022_set_pll;
+		max_pll[0] = get_pll(0);
+		max_pll[1] = get_pll(1);
+	}
+
+	pr_info("Freescale MPC85xx cpufreq(JOG) driver\n");
+
+	of_node_put(np);
+	return cpufreq_register_driver(&mpc85xx_cpufreq_driver);
+}
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index e7a896a..a1518af 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -213,6 +213,17 @@ config CPU_FREQ_PMAC64
 	  This adds support for frequency switching on Apple iMac G5,
 	  and some of the more recent desktop G5 machines as well.
 
+config MPC85xx_CPUFREQ
+	bool "Support for Freescale MPC85xx CPU freq"
+	depends on PPC_85xx && FSL_PMC
+	default n
+	select CPU_FREQ_TABLE
+	help
+	  This adds support for dynamic frequency switching on
+	  Freescale MPC85xx by cpufreq interface. MPC8536 and P1022
+	  have a JOG feature, which provides a dynamic mechanism
+	  to lower or raise the CPU core clock at runtime.
+
 config PPC_PASEMI_CPUFREQ
 	bool "Support for PA Semi PWRficient"
 	depends on PPC_PASEMI
diff --git a/arch/powerpc/sysdev/fsl_pmc.c b/arch/powerpc/sysdev/fsl_pmc.c
index b6c8c8f..b809a1b 100644
--- a/arch/powerpc/sysdev/fsl_pmc.c
+++ b/arch/powerpc/sysdev/fsl_pmc.c
@@ -202,6 +202,9 @@ static int pmc_probe(struct platform_device *pdev)
 
 	suspend_set_ops(&pmc_suspend_ops);
 
+#ifdef CONFIG_MPC85xx_CPUFREQ
+	mpc85xx_jog_probe();
+#endif
 	pr_info("Freescale PMC driver: sleep(standby)%s\n",
 		(pmc_flag & PMC_DEEP_SLEEP) ? ", deep sleep(mem)" : "");
 	return 0;
diff --git a/arch/powerpc/sysdev/fsl_soc.h b/arch/powerpc/sysdev/fsl_soc.h
index b1510ef..25be25c 100644
--- a/arch/powerpc/sysdev/fsl_soc.h
+++ b/arch/powerpc/sysdev/fsl_soc.h
@@ -65,5 +65,7 @@ void fsl_hv_halt(void);
  * code can be compatible with both 32-bit & 36-bit.
  */
 extern void mpc85xx_enter_deep_sleep(u64 ccsrbar, u32 powmgtreq);
+
+extern int mpc85xx_jog_probe(void);
 #endif
 #endif
-- 
1.6.4.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox