LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-11 20:41 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: Tabi Timur-B04825, Alexander Graf, linuxppc-dev@ozlabs.org,
	Gala Kumar-B11780
In-Reply-To: <20110711130430.4b3036f6@schlenkerla.am.freescale.net>



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Monday, July 11, 2011 1:05 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Tabi Timur-B04825; Grant Likely; Benjamin Herrensc=
hmidt; Gala Kumar-
> B11780; Alexander Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Mon, 11 Jul 2011 12:41:20 -0500
> Yoder Stuart-B08248 <B08248@freescale.com> wrote:
>=20
> >
> >
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Monday, July 11, 2011 11:24 AM
> > > To: Tabi Timur-B04825
> > > Cc: Yoder Stuart-B08248; Grant Likely; Benjamin Herrenschmidt; Gala
> > > Kumar-B11780; Wood Scott- B07421; Alexander Graf;
> > > linuxppc-dev@ozlabs.org
> > > Subject: Re: RFC: top level compatibles for virtual platforms
> > >
> > > On Mon, 11 Jul 2011 10:45:47 -0500
> > > Timur Tabi <timur@freescale.com> wrote:
> > >
> > > > >> Also, if these are KVM creations, shouldn't there be a "kvm" in
> > > > >> the compatible string somewhere?
> > > > >
> > > > > There is nothing KVM specific about these platforms.  Any
> > > > > hypervisor could create a similar virtual machine.
> > > >
> > > > True, but I think we're on a slippery slope, here.  Virtualization
> > > > allows us to create "virtual platforms" that are not well defined.
> > > > Linux requires a unique compatible string for each platform.
> > >
> > > The device tree is supposed to describe the hardware (virtual or
> > > otherwise), not just supply what Linux wants.  Perhaps there simply
> > > shouldn't be a toplevel compatible if there's nothing appropriate to =
describe there -- and
> fix whatever issues Linux has with that.
> >
> > But there is a concept in Linux of a platform 'machine':
>=20
> So have a Linux "machine" that is used when no other one matches.  That d=
oesn't justify making
> something up in the device tree.
>=20
> > define_machine(p4080_ds) {
> >         .name                   =3D "P4080 DS",
> >         .probe                  =3D p4080_ds_probe,
> >         .setup_arch             =3D corenet_ds_setup_arch,
> >         .init_IRQ               =3D corenet_ds_pic_init,
> > #ifdef CONFIG_PCI
> >         .pcibios_fixup_bus      =3D fsl_pcibios_fixup_bus,
> > #endif
> >         .get_irq                =3D mpic_get_coreint_irq,
> >         .restart                =3D fsl_rstcr_restart,
> >         .calibrate_decr         =3D generic_calibrate_decr,
> >         .progress               =3D udbg_progress,
> > };
> >
> > Right now p4080_ds_probe needs something to match on to determine
> > whether this is the machine type.   How would it work if
> > there was no top level compatible to match on?   Some
> > platforms (e.g. e500v2-type) need mpc85xx_ds_pic_init(), others need
> > corenet_ds_pic_init().
>=20
> Just because Linux does it that way now doesn't mean it needs to.  The in=
terrupt controller
> has a compatible property.  Match on it like any other device.  You can f=
ind which one is the
> root interrupt controller by looking for nodes with the interrupt-control=
ler property that
> doesn't have an explicit interrupt-parent (or an interrupts property?  se=
ems to be a conflict
> between ePAPR and the original interrupt mapping document).

This may be the right long term thing to do, but restructuring
how Linux powerpc platforms work is a bigger effort.  I was looking
for an incremental improvement over what we do now, which is pass
a compatible of MPC8544DS and P4080DS for these virtual platforms.

However, they _are_ compatible with MPC8544DS and P4080DS so maybe
leaving the compatible string alone is ok for now.

Stuart

^ permalink raw reply

* Re: RFC: top level compatibles for virtual platforms
From: Scott Wood @ 2011-07-11 21:06 UTC (permalink / raw)
  To: Yoder Stuart-B08248
  Cc: Wood Scott-B07421, Tabi Timur-B04825, Alexander Graf,
	linuxppc-dev@ozlabs.org, Gala Kumar-B11780
In-Reply-To: <9F6FE96B71CF29479FF1CDC8046E150316FF6B@039-SN1MPN1-003.039d.mgd.msft.net>

On Mon, 11 Jul 2011 15:41:35 -0500
Yoder Stuart-B08248 <B08248@freescale.com> wrote:

> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Monday, July 11, 2011 1:05 PM
> > 
> > Just because Linux does it that way now doesn't mean it needs to.  The interrupt controller
> > has a compatible property.  Match on it like any other device.  You can find which one is the
> > root interrupt controller by looking for nodes with the interrupt-controller property that
> > doesn't have an explicit interrupt-parent (or an interrupts property?  seems to be a conflict
> > between ePAPR and the original interrupt mapping document).
> 
> This may be the right long term thing to do, but restructuring
> how Linux powerpc platforms work is a bigger effort.  I was looking
> for an incremental improvement over what we do now, which is pass
> a compatible of MPC8544DS and P4080DS for these virtual platforms.

A hack is usually easier than doing it right. :-)

Though often the effort required for the latter is overstated, and the
"right long term thing" never makes the jump to "short term plan".

There are a few things that need to be driven off the device tree that
currently aren't -- using some mechanism other than the standard
device model, if necessary (or as a first step) -- and then we need a
does-nothing default platform as the match of last resort.

> However, they _are_ compatible with MPC8544DS and P4080DS so maybe
> leaving the compatible string alone is ok for now.

The virtual platforms are not compatible with MPC8544DS or P4080DS.  Only a
subset of what is on those boards is provided.  And in the case of direct
device assignment, often the things that are present are incompatible (e.g.
different type of eTSEC).

-Scott

^ permalink raw reply

* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2011-07-12  1:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list

Hi Linus !

I almost forgot to send you these two patches I had around for some time
now. They fix a nasty warning at boot (and possibly more) when booting a
kernel with CONFIG_PPC_PSERIES enabled on a non-pseries machine.

The first patch just moves a duplicate #define to a common header file,
the second patch is the actual fix.

Cheers,
Ben.

The following changes since commit 620917de59eeb934b9f8cf35cc2d95c1ac8ed0fc:

  Linux 3.0-rc7 (2011-07-11 16:51:52 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Benjamin Herrenschmidt (2):
      mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
      powerpc/mm: Fix memory_block_size_bytes() for non-pseries

 arch/powerpc/platforms/pseries/hotplug-memory.c |   30 ++++++++++++++--------
 arch/x86/mm/init_64.c                           |    3 +-
 drivers/base/memory.c                           |    1 -
 include/linux/memory.h                          |    2 +
 4 files changed, 22 insertions(+), 14 deletions(-)

^ permalink raw reply

* Re: [v3 PATCH 1/1] booke/kprobe: make program exception to use one dedicated exception stack
From: tiejun.chen @ 2011-07-12  2:35 UTC (permalink / raw)
  To: benh, ananth; +Cc: Tiejun Chen, linuxppc-dev
In-Reply-To: <1310383915-30543-1-git-send-email-tiejun.chen@windriver.com>

Tiejun Chen wrote:
> When kprobe these operations such as store-and-update-word for SP(r1),
> 
> stwu r1, -A(r1)
> 
> The program exception is triggered, and PPC always allocate an exception frame
> as shown as the follows:
> 
> old r1 ----------
>          ...
>          nip
>          gpr[2] ~ gpr[31]
>          gpr[1] <--------- old r1 is stored.
>          gpr[0]
>        -------- <--------- pr_regs @offset 16 bytes
>        padding
>        STACK_FRAME_REGS_MARKER
>        LR
>        back chain
> new r1 ----------
> Then emulate_step() will emulate this instruction, 'stwu'. Actually its
> equivalent to:
> 1> Update pr_regs->gpr[1] = mem[old r1 + (-A)]
> 2> stw [old r1], mem[old r1 + (-A)]
> 
> Please notice the stack based on new r1 may be covered with mem[old r1
> +(-A)] when addr[old r1 + (-A)] < addr[old r1 + sizeof(an exception frame0].
> So the above 2# operation will overwirte something to break this exception
> frame then unexpected kernel problem will be issued.
> 
> So looks we have to implement independed interrupt stack for PPC program
> exception when CONFIG_BOOKE is enabled. Here we can use
> EXC_LEVEL_EXCEPTION_PROLOG to replace original NORMAL_EXCEPTION_PROLOG
> for program exception if CONFIG_BOOKE. Then its always safe for kprobe
> with independed exc stack from one pre-allocated and dedicated thread_info.
> Actually this is just waht we did for critical/machine check exceptions
> on PPC.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
>  arch/powerpc/include/asm/irq.h   |    3 +++
>  arch/powerpc/include/asm/reg.h   |    4 ++++
>  arch/powerpc/kernel/entry_32.S   |   15 +++++++++++++++
>  arch/powerpc/kernel/head_booke.h |   15 +++++++++++++--
>  arch/powerpc/kernel/irq.c        |   11 +++++++++++
>  arch/powerpc/kernel/setup_32.c   |    4 ++++
>  6 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
> index 1bff591..6d12169 100644
> --- a/arch/powerpc/include/asm/irq.h
> +++ b/arch/powerpc/include/asm/irq.h
> @@ -313,6 +313,9 @@ struct pt_regs;
>  extern struct thread_info *critirq_ctx[NR_CPUS];
>  extern struct thread_info *dbgirq_ctx[NR_CPUS];
>  extern struct thread_info *mcheckirq_ctx[NR_CPUS];
> +#if defined(CONFIG_KPROBES) && defined(CONFIG_BOOKE)
> +extern struct thread_info *pgirq_ctx[NR_CPUS];
> +#endif
>  extern void exc_lvl_ctx_init(void);
>  #else
>  #define exc_lvl_ctx_init()
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index c5cae0d..34d6178 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -885,6 +885,10 @@
>  #endif
>  #define SPRN_SPRG_RVCPU		SPRN_SPRG1
>  #define SPRN_SPRG_WVCPU		SPRN_SPRG1
> +#ifdef	CONFIG_KPROBES
> +#define	SPRN_SPRG_RSCRATCH_PG	SPRN_SPRG0
> +#define	SPRN_SPRG_WSCRATCH_PG	SPRN_SPRG0
> +#endif
>  #endif
>  
>  #ifdef CONFIG_8xx
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 56212bc..a99e209 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -1122,6 +1122,21 @@ ret_from_mcheck_exc:
>  	RESTORE_xSRR(DSRR0,DSRR1);
>  	RESTORE_MMU_REGS;
>  	RET_FROM_EXC_LEVEL(SPRN_MCSRR0, SPRN_MCSRR1, PPC_RFMCI)
> +
> +	.globl	ret_from_prog_exc
> +ret_from_prog_exc:
> +	mfspr	r9,SPRN_SPRG_THREAD
> +	lwz	r10,SAVED_KSP_LIMIT(r1)
> +	stw	r10,KSP_LIMIT(r9)
> +	lwz	r9,THREAD_INFO-THREAD(r9)
> +	rlwinm	r10,r1,0,0,(31-THREAD_SHIFT)
> +	lwz	r10,TI_PREEMPT(r10)
> +	stw	r10,TI_PREEMPT(r9)
> +	RESTORE_xSRR(SRR0,SRR1);
> +	RESTORE_xSRR(CSRR0,CSRR1);
> +	RESTORE_xSRR(DSRR0,DSRR1);
> +	RESTORE_MMU_REGS;
> +	RET_FROM_EXC_LEVEL(SPRN_SRR0, SPRN_SRR1, rfi)
>  #endif /* CONFIG_BOOKE */

After a further consideration, to improve the above code fragment with the following
------
+
+       .globl  ret_from_prog_exc
+ret_from_prog_exc:
+#ifdef CONFIG_KPROBES
+       mfspr   r9,SPRN_SPRG_THREAD
+       lwz     r9,THREAD_INFO-THREAD(r9)
+       rlwinm  r10,r1,0,0,(31-THREAD_SHIFT)
+       lwz     r10,TI_PREEMPT(r10)
+       stw     r10,TI_PREEMPT(r9)
+       RET_FROM_EXC_LEVEL(SPRN_SRR0, SPRN_SRR1, rfi)
+#else
+       b       ret_from_except_full
+#endif

Here remove unnecessary restore, and also make sure its still same as normal
program exception when !CONFIG_KPROBES.

Tiejun

>  
>  /*
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index a0bf158..941be40 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -79,6 +79,10 @@
>  /* only on e500mc/e200 */
>  #define DBG_STACK_BASE		dbgirq_ctx
>  
> +#if defined(CONFIG_KPROBES)
> +#define PG_STACK_BASE		pgirq_ctx
> +#endif
> +
>  #define EXC_LVL_FRAME_OVERHEAD	(THREAD_SIZE - INT_FRAME_SIZE - EXC_LVL_SIZE)
>  
>  #ifdef CONFIG_SMP
> @@ -158,6 +162,12 @@
>  		EXC_LEVEL_EXCEPTION_PROLOG(DBG, SPRN_DSRR0, SPRN_DSRR1)
>  #define MCHECK_EXCEPTION_PROLOG \
>  		EXC_LEVEL_EXCEPTION_PROLOG(MC, SPRN_MCSRR0, SPRN_MCSRR1)
> +#if defined(CONFIG_KPROBES)
> +#define	PROGRAM_EXCEPTION_PROLOG \
> +		EXC_LEVEL_EXCEPTION_PROLOG(PG, SPRN_SRR0, SPRN_SRR1)
> +#else
> +#define	PROGRAM_EXCEPTION_PROLOG	NORMAL_EXCEPTION_PROLOG
> +#endif
>  
>  /*
>   * Exception vectors.
> @@ -370,11 +380,12 @@ label:
>  
>  #define PROGRAM_EXCEPTION						      \
>  	START_EXCEPTION(Program)					      \
> -	NORMAL_EXCEPTION_PROLOG;					      \
> +	PROGRAM_EXCEPTION_PROLOG;					      \
>  	mfspr	r4,SPRN_ESR;		/* Grab the ESR and save it */	      \
>  	stw	r4,_ESR(r11);						      \
>  	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
> -	EXC_XFER_STD(0x0700, program_check_exception)
> +	EXC_XFER_TEMPLATE(program_check_exception, 0x0700, MSR_KERNEL, NOCOPY,\
> +		transfer_to_handler_full, ret_from_prog_exc)
>  
>  #define DECREMENTER_EXCEPTION						      \
>  	START_EXCEPTION(Decrementer)					      \
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 5b428e3..ff5b8dd 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -397,6 +397,10 @@ struct thread_info   *critirq_ctx[NR_CPUS] __read_mostly;
>  struct thread_info    *dbgirq_ctx[NR_CPUS] __read_mostly;
>  struct thread_info *mcheckirq_ctx[NR_CPUS] __read_mostly;
>  
> +#if defined(CONFIG_KPROBES) && defined(CONFIG_BOOKE)
> +struct thread_info    *pgirq_ctx[NR_CPUS] __read_mostly;
> +#endif
> +
>  void exc_lvl_ctx_init(void)
>  {
>  	struct thread_info *tp;
> @@ -423,6 +427,13 @@ void exc_lvl_ctx_init(void)
>  		tp = mcheckirq_ctx[cpu_nr];
>  		tp->cpu = cpu_nr;
>  		tp->preempt_count = HARDIRQ_OFFSET;
> +
> +#if defined(CONFIG_KPROBES)
> +		memset((void *)pgirq_ctx[i], 0, THREAD_SIZE);
> +		tp = pgirq_ctx[i];
> +		tp->cpu = i;
> +		tp->preempt_count = 0;
> +#endif
>  #endif
>  	}
>  }
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 620d792..b872564 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -272,6 +272,10 @@ static void __init exc_lvl_early_init(void)
>  			__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
>  		mcheckirq_ctx[hw_cpu] = (struct thread_info *)
>  			__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
> +#ifdef CONFIG_KPROBES
> +		pgirq_ctx[hw_cpu] = (struct thread_info *)
> +			__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
> +#endif
>  #endif
>  	}
>  }

^ permalink raw reply

* Re: [PATCH] powerpc: add denormalisation exception handling for POWER6/7
From: Kumar Gala @ 2011-07-12  3:51 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Paul Mackerras, miltonm, linuxppc-dev
In-Reply-To: <11738.1310363538@neuling.org>


On Jul 11, 2011, at 12:52 AM, Michael Neuling wrote:

> On POWER6 and POWER7 if the input operand to an instruction is a
> denormalised single precision binary floating we can take a
> denormalisation exception where it's expected that the hypervisor =
(HV=3D1)
> will fix up the inputs before the instruction is run.
>=20
> This adds code to handle this denormalisation exception for POWER6 and
> POWER7.
>=20
> It also add a CONFIG_PPC_DENORMALISATION option and sets it in
> pseries/ppc64_defconfig.=20
>=20
> This is useful on bare metal systems only.  Based on patch from Milton
> Miller.
>=20
> Signed-off-by: Michael Neuling <mikey@neuling.org>
>=20
> ---
> arch/powerpc/Kconfig                   |    7 +
> arch/powerpc/configs/ppc64_defconfig   |    1=20
> arch/powerpc/configs/pseries_defconfig |    1=20
> arch/powerpc/include/asm/ppc-opcode.h  |    2=20
> arch/powerpc/include/asm/reg.h         |    1=20
> arch/powerpc/kernel/exceptions-64s.S   |  125 =
+++++++++++++++++++++++++++++++++
> 6 files changed, 137 insertions(+)

Is it possible to run POWER6/7 systems in baremetal if you are not IBM?

Just wondering if this is useful to anyone but IBM.

- k=

^ permalink raw reply

* Re: [PATCH] powerpc: add denormalisation exception handling for POWER6/7
From: Benjamin Herrenschmidt @ 2011-07-12  4:12 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras, miltonm
In-Reply-To: <9875FD87-1762-4CCE-8A34-7BAC2C42E9A6@kernel.crashing.org>

On Mon, 2011-07-11 at 22:51 -0500, Kumar Gala wrote:
> On Jul 11, 2011, at 12:52 AM, Michael Neuling wrote:
> 
> > On POWER6 and POWER7 if the input operand to an instruction is a
> > denormalised single precision binary floating we can take a
> > denormalisation exception where it's expected that the hypervisor (HV=1)
> > will fix up the inputs before the instruction is run.
> > 
> > This adds code to handle this denormalisation exception for POWER6 and
> > POWER7.
> > 
> > It also add a CONFIG_PPC_DENORMALISATION option and sets it in
> > pseries/ppc64_defconfig. 
> > 
> > This is useful on bare metal systems only.  Based on patch from Milton
> > Miller.
> > 
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > 
> > ---
> > arch/powerpc/Kconfig                   |    7 +
> > arch/powerpc/configs/ppc64_defconfig   |    1 
> > arch/powerpc/configs/pseries_defconfig |    1 
> > arch/powerpc/include/asm/ppc-opcode.h  |    2 
> > arch/powerpc/include/asm/reg.h         |    1 
> > arch/powerpc/kernel/exceptions-64s.S   |  125 +++++++++++++++++++++++++++++++++
> > 6 files changed, 137 insertions(+)
> 
> Is it possible to run POWER6/7 systems in baremetal if you are not IBM?
> 
> Just wondering if this is useful to anyone but IBM.

While I cannot obviously comment on unreleased systems, the fact that
we've been contributing all sort of patches to run P7 in "HV" mode and
KVM on them should be enough of a hint :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC PATCH] powerpc: 85xx: Make e500/e500v2 depend on !E500MC
From: Baruch Siach @ 2011-07-12  4:15 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <fef1cfc9a418ec5aa3302915dcb392882f7dd5d2.1308545584.git.baruch@tkos.co.il>

Hi Kumar,

On Mon, Jun 20, 2011 at 07:56:10AM +0300, Baruch Siach wrote:
> CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
> breaking clear_pages(), probably others too.

Ping?
Ack/Nack?

baruch

> Cc: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
> Is this the right approach?
> 
>  arch/powerpc/platforms/85xx/Kconfig |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
> index b6976e1..5b8546d 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -13,6 +13,8 @@ if FSL_SOC_BOOKE
>  
>  if PPC32
>  
> +if !PPC_E500MC
> +
>  config MPC8540_ADS
>  	bool "Freescale MPC8540 ADS"
>  	select DEFAULT_UIMAGE
> @@ -155,6 +157,8 @@ config SBC8560
>  	help
>  	  This option enables support for the Wind River SBC8560 board
>  
> +endif # !PPC_E500MC
> +
>  config P3041_DS
>  	bool "Freescale P3041 DS"
>  	select DEFAULT_UIMAGE
> -- 
> 1.7.5.3

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* [PATCH v2] kexec-tools: ppc32: Fixup the ThreadPointer for purgatory code.
From: Suzuki K. Poulose @ 2011-07-12  4:50 UTC (permalink / raw)
  To: Simon Horman
  Cc: Suzuki K. Poulose, kexec, Josh Boyer, Paul Mackerras,
	linux ppc dev, Vivek Goyal

PPC32 ELF ABI expects r2 to be loaded with Thread Pointer, which is 0x7000
bytes past the end of TCB. Though the purgatory is single threaded, it uses
TCB scratch space in vsnprintf(). This patch allocates a 1024byte TCB
and populates the TP with the address accordingly.

Changes from V1: Fixed the addr calculation for uImage support.


Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Ryan S. Arnold <rsa@us.ibm.com>
---

 kexec/arch/ppc/kexec-elf-ppc.c     |    9 +++++++++
 kexec/arch/ppc/kexec-uImage-ppc.c  |    8 ++++++++
 purgatory/arch/ppc/purgatory-ppc.c |    2 +-
 purgatory/arch/ppc/v2wrap_32.S     |    4 ++++
 4 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/kexec/arch/ppc/kexec-elf-ppc.c b/kexec/arch/ppc/kexec-elf-ppc.c
index f4443b4..3a4b59b 100644
--- a/kexec/arch/ppc/kexec-elf-ppc.c
+++ b/kexec/arch/ppc/kexec-elf-ppc.c
@@ -414,6 +414,15 @@ int elf_ppc_load(int argc, char **argv,	const char *buf, off_t len,
 	elf_rel_set_symbol(&info->rhdr, "stack", &addr, sizeof(addr));
 #undef PUL_STACK_SIZE
 
+#define TCB_SIZE 1024
+#define TCB_TP_OFFSET 0x7000	/* PPC32 ELF ABI */
+
+	addr = locate_hole(info, TCB_SIZE, 0, 0, elf_max_addr(&ehdr), 1);
+	addr += TCB_SIZE + TCB_TP_OFFSET;
+	elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+#undef TCB_SIZE
+#undef TCB_TP_OFFSET
+
 	addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
 	info->entry = (void *)addr;
 #endif
diff --git a/kexec/arch/ppc/kexec-uImage-ppc.c b/kexec/arch/ppc/kexec-uImage-ppc.c
index 1d71374..4923c83 100644
--- a/kexec/arch/ppc/kexec-uImage-ppc.c
+++ b/kexec/arch/ppc/kexec-uImage-ppc.c
@@ -228,6 +228,14 @@ static int ppc_load_bare_bits(int argc, char **argv, const char *buf,
 	/* No allocation past here in order not to overwrite the stack */
 #undef PUL_STACK_SIZE
 
+#define TCB_SIZE 	1024
+#define TCB_TP_OFFSET 	0x7000
+	addr = locate_hole(info, TCB_SIZE, 0, 0, -1, 1);
+	addr += TCB_SIZE + TCB_TP_OFFSET;
+	elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+#undef TCB_TP_OFFSET
+#undef TCB_SIZE
+
 	addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
 	info->entry = (void *)addr;
 
diff --git a/purgatory/arch/ppc/purgatory-ppc.c b/purgatory/arch/ppc/purgatory-ppc.c
index 349e750..3e6b354 100644
--- a/purgatory/arch/ppc/purgatory-ppc.c
+++ b/purgatory/arch/ppc/purgatory-ppc.c
@@ -26,7 +26,7 @@ unsigned int panic_kernel = 0;
 unsigned long backup_start = 0;
 unsigned long stack = 0;
 unsigned long dt_offset = 0;
-unsigned long my_toc = 0;
+unsigned long my_thread_ptr = 0;
 unsigned long kernel = 0;
 
 void setup_arch(void)
diff --git a/purgatory/arch/ppc/v2wrap_32.S b/purgatory/arch/ppc/v2wrap_32.S
index 8442d16..8b60677 100644
--- a/purgatory/arch/ppc/v2wrap_32.S
+++ b/purgatory/arch/ppc/v2wrap_32.S
@@ -56,6 +56,10 @@ master:
 	mr      17,3            # save cpu id to r17
 	mr      15,4            # save physical address in reg15
 
+	lis	6,my_thread_ptr@h
+	ori	6,6,my_thread_ptr@l
+	lwz	2,0(6)		# setup ThreadPointer(TP)
+
 	lis	6,stack@h
 	ori	6,6,stack@l
 	lwz     1,0(6)          #setup stack

^ permalink raw reply related

* [PATCH v2] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: b35362 @ 2011-07-12  4:48 UTC (permalink / raw)
  To: dwmw2; +Cc: Liu Shuo, linuxppc-dev, linux-mtd

From: Liu Shuo <b35362@freescale.com>

Freescale FCM controller has a 2K size limitation of buffer RAM. In order
to support the Nand flash chip whose page size is larger than 2K bytes,
we divide a page into multi-2K pages for MTD layer driver. In that case,
we force to set the page size to 2K bytes. We convert the page address of
MTD layer driver to a real page address in flash chips and a column index
in fsl_elbc driver. We can issue any column address by UA instruction of
elbc controller.

Signed-off-by: Liu Shuo <b35362@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
---
 drivers/mtd/nand/fsl_elbc_nand.c |   66 ++++++++++++++++++++++++++++++-------
 1 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c
index a212116..884a9f1 100644
--- a/drivers/mtd/nand/fsl_elbc_nand.c
+++ b/drivers/mtd/nand/fsl_elbc_nand.c
@@ -76,6 +76,13 @@ struct fsl_elbc_fcm_ctrl {
 	unsigned int oob;        /* Non zero if operating on OOB data     */
 	unsigned int counter;	 /* counter for the initializations	  */
 	char *oob_poi;           /* Place to write ECC after read back    */
+
+	/*
+	 * If writesize > 2048, these two members are used to calculate
+	 * the real page address and real column address.
+	 */
+	int subpage_shift;
+	int subpage_mask;
 };
 
 /* These map to the positions used by the FCM hardware ECC generator */
@@ -164,18 +171,27 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
 	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = ctrl->nand;
 	int buf_num;
+	u32 real_ca = column;
 
-	elbc_fcm_ctrl->page = page_addr;
+	if (priv->page_size && elbc_fcm_ctrl->subpage_shift) {
+		real_ca = (page_addr & elbc_fcm_ctrl->subpage_mask) * 2112;
+		page_addr >>= elbc_fcm_ctrl->subpage_shift;
+	}
 
-	out_be32(&lbc->fbar,
-	         page_addr >> (chip->phys_erase_shift - chip->page_shift));
+	elbc_fcm_ctrl->page = page_addr;
 
 	if (priv->page_size) {
+		real_ca += (oob ? 2048 : 0);
+		elbc_fcm_ctrl->use_mdr = 1;
+		elbc_fcm_ctrl->mdr = real_ca;
+
+		out_be32(&lbc->fbar, page_addr >> 6);
 		out_be32(&lbc->fpar,
 		         ((page_addr << FPAR_LP_PI_SHIFT) & FPAR_LP_PI) |
 		         (oob ? FPAR_LP_MS : 0) | column);
 		buf_num = (page_addr & 1) << 2;
 	} else {
+		out_be32(&lbc->fbar, page_addr >> 5);
 		out_be32(&lbc->fpar,
 		         ((page_addr << FPAR_SP_PI_SHIFT) & FPAR_SP_PI) |
 		         (oob ? FPAR_SP_MS : 0) | column);
@@ -256,10 +272,11 @@ static void fsl_elbc_do_read(struct nand_chip *chip, int oob)
 	if (priv->page_size) {
 		out_be32(&lbc->fir,
 		         (FIR_OP_CM0 << FIR_OP0_SHIFT) |
-		         (FIR_OP_CA  << FIR_OP1_SHIFT) |
-		         (FIR_OP_PA  << FIR_OP2_SHIFT) |
-		         (FIR_OP_CM1 << FIR_OP3_SHIFT) |
-		         (FIR_OP_RBW << FIR_OP4_SHIFT));
+		         (FIR_OP_UA  << FIR_OP1_SHIFT) |
+		         (FIR_OP_UA  << FIR_OP2_SHIFT) |
+		         (FIR_OP_PA  << FIR_OP3_SHIFT) |
+		         (FIR_OP_CM1 << FIR_OP4_SHIFT) |
+		         (FIR_OP_RBW << FIR_OP5_SHIFT));
 
 		out_be32(&lbc->fcr, (NAND_CMD_READ0 << FCR_CMD0_SHIFT) |
 		                    (NAND_CMD_READSTART << FCR_CMD1_SHIFT));
@@ -399,12 +416,13 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		if (priv->page_size) {
 			out_be32(&lbc->fir,
 			         (FIR_OP_CM2 << FIR_OP0_SHIFT) |
-			         (FIR_OP_CA  << FIR_OP1_SHIFT) |
-			         (FIR_OP_PA  << FIR_OP2_SHIFT) |
-			         (FIR_OP_WB  << FIR_OP3_SHIFT) |
-			         (FIR_OP_CM3 << FIR_OP4_SHIFT) |
-			         (FIR_OP_CW1 << FIR_OP5_SHIFT) |
-			         (FIR_OP_RS  << FIR_OP6_SHIFT));
+			         (FIR_OP_UA  << FIR_OP1_SHIFT) |
+			         (FIR_OP_UA  << FIR_OP2_SHIFT) |
+			         (FIR_OP_PA  << FIR_OP3_SHIFT) |
+			         (FIR_OP_WB  << FIR_OP4_SHIFT) |
+			         (FIR_OP_CM3 << FIR_OP5_SHIFT) |
+			         (FIR_OP_CW1 << FIR_OP6_SHIFT) |
+			         (FIR_OP_RS  << FIR_OP7_SHIFT));
 		} else {
 			out_be32(&lbc->fir,
 			         (FIR_OP_CM0 << FIR_OP0_SHIFT) |
@@ -453,6 +471,9 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 			full_page = 1;
 		}
 
+		if (priv->page_size)
+			elbc_fcm_ctrl->use_mdr = 1;
+
 		fsl_elbc_run_command(mtd);
 
 		/* Read back the page in order to fill in the ECC for the
@@ -654,9 +675,28 @@ static int fsl_elbc_chip_init_tail(struct mtd_info *mtd)
 	struct nand_chip *chip = mtd->priv;
 	struct fsl_elbc_mtd *priv = chip->priv;
 	struct fsl_lbc_ctrl *ctrl = priv->ctrl;
+	struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = ctrl->nand;
 	struct fsl_lbc_regs __iomem *lbc = ctrl->regs;
 	unsigned int al;
 
+	/*
+	 * Hack for supporting the flash chip whose writesize is
+	 * larger than 2K bytes.
+	 */
+	if (mtd->writesize > 2048) {
+		elbc_fcm_ctrl->subpage_shift = ffs(mtd->writesize >> 11) - 1;
+		elbc_fcm_ctrl->subpage_mask =
+			(1 << elbc_fcm_ctrl->subpage_shift) - 1;
+		/*
+		 * Rewrite mtd->writesize, mtd->oobsize, chip->page_shift
+		 * and chip->pagemask.
+		 */
+		mtd->writesize = 2048;
+		mtd->oobsize = 64;
+		chip->page_shift = ffs(mtd->writesize) - 1;
+		chip->pagemask = (chip->chipsize >> chip->page_shift) - 1;
+	}
+
 	/* calculate FMR Address Length field */
 	al = 0;
 	if (chip->pagemask & 0xffff0000)
-- 
1.7.1

^ permalink raw reply related

* [PATCH v2] powerpc32: Kexec support for PPC440X chipsets
From: Suzuki K. Poulose @ 2011-07-12  6:44 UTC (permalink / raw)
  To: Benjammin Herrenschmidt, Kumar Gala
  Cc: Suzuki Poulose, Sebastian Andrzej Siewior, kexec, lkml,
	Josh Boyer, Paul Mackerras, linux ppc dev, Vivek Goyal

Changes from V1: Uses a tmp mapping in the other address space to setup
		 the 1:1 mapping (suggested by Sebastian Andrzej Siewior).

Note: Should we do the same for kernel entry code for PPC44x ?

This patch adds kexec support for PPC440 based chipsets.This work is based
on the KEXEC patches for FSL BookE.

The FSL BookE patch and the code flow could be found at the link below:

	http://patchwork.ozlabs.org/patch/49359/

Steps:

1) Invalidate all the TLB entries except the one this code is run from
2) Create a tmp mapping for our code in the other address space and jump to it
3) Invalidate the entry we used
4) Create a 1:1 mapping for 0-2GiB in blocks of 256M
5) Jump to the new 1:1 mapping and invalidate the tmp mapping

I have tested this patches on Ebony, Sequoia boards and Virtex on QEMU.
It would be great if somebody could test this on the other boards.

Signed-off-by: 	Suzuki Poulose <suzuki@in.ibm.com>
Cc:	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

 arch/powerpc/Kconfig             |    2 
 arch/powerpc/include/asm/kexec.h |    2 
 arch/powerpc/kernel/misc_32.S    |  170 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 423145a6..d04fae0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -349,7 +349,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 
 config KEXEC
 	bool "kexec system call (EXPERIMENTAL)"
-	depends on (PPC_BOOK3S || FSL_BOOKE) && EXPERIMENTAL
+	depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP && !47x)) && EXPERIMENTAL
 	help
 	  kexec is a system call that implements the ability to shutdown your
 	  current kernel, and to start another kernel.  It is like a reboot
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 8a33698..f921eb1 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -2,7 +2,7 @@
 #define _ASM_POWERPC_KEXEC_H
 #ifdef __KERNEL__
 
-#ifdef CONFIG_FSL_BOOKE
+#if defined(CONFIG_FSL_BOOKE) || defined(CONFIG_44x)
 
 /*
  * On FSL-BookE we setup a 1:1 mapping which covers the first 2GiB of memory
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 998a100..a7881ab 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -8,6 +8,7 @@
  * kexec bits:
  * Copyright (C) 2002-2003 Eric Biederman  <ebiederm@xmission.com>
  * GameCube/ppc32 port Copyright (C) 2004 Albert Herranz
+ * PPC440x port  Copyright (C) 2011 Suzuki Poulose <suzuki@in.ibm.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -736,6 +737,175 @@ relocate_new_kernel:
 	mr      r5, r31
 
 	li	r0, 0
+#elif defined(CONFIG_44x)  && !defined(CONFIG_47x)
+
+/*
+ * Code for setting up 1:1 mapping for PPC440x for KEXEC
+ *
+ * We cannot switch off the MMU on PPC44x.
+ * So we:
+ * 1) Invalidate all the mappings except the one we are running from.
+ * 2) Create a tmp mapping for our code in the other address space(TS) and
+ *    jump to it. Invalidate the entry we started in.
+ * 3) Create a 1:1 mapping for 0-2GiB in chunks of 256M in original TS.
+ * 4) Jump to the 1:1 mapping in original TS.
+ * 5) Invalidate the tmp mapping.
+ *
+ * - Based on the kexec support code for FSL BookE
+ * - Doesn't support 47x yet.
+ *
+ */
+	/* Save our parameters */
+	mr	r29, r3
+	mr	r30, r4
+	mr	r31, r5
+
+	/* Load our MSR_IS and TID to MMUCR for TLB search */
+	mfspr	r3,SPRN_PID
+	mfmsr	r4
+	andi.	r4,r4,MSR_IS@l
+	beq	wmmucr
+	oris	r3,r3,PPC44x_MMUCR_STS@h
+wmmucr:
+	mtspr	SPRN_MMUCR,r3
+	sync
+
+	/*
+	 * Invalidate all the TLB entries except the current entry
+	 * where we are running from
+	 */
+	bl	0f				/* Find our address */
+0:	mflr	r5				/* Make it accessible */
+	tlbsx	r23,0,r5			/* Find entry we are in */
+	li	r4,0				/* Start at TLB entry 0 */
+	li	r3,0				/* Set PAGEID inval value */
+1:	cmpw	r23,r4				/* Is this our entry? */
+	beq	skip				/* If so, skip the inval */
+	tlbwe	r3,r4,PPC44x_TLB_PAGEID		/* If not, inval the entry */
+skip:
+	addi	r4,r4,1				/* Increment */
+	cmpwi	r4,64				/* Are we done?	*/
+	bne	1b				/* If not, repeat */
+	isync
+
+	/* Create a temp mapping and jump to it */
+	andi.	r6, r23, 1		/* Find the index to use */
+	addi	r24, r6, 1		/* r24 will contain 1 or 2 */
+
+	mfmsr	r9			/* get the MSR */
+	rlwinm	r5, r9, 27, 31, 31	/* Extract the MSR[IS] */
+	xori	r7, r5, 1		/* Use the other address space */
+
+	/* Read the current mapping entries */
+	tlbre	r3, r23, PPC44x_TLB_PAGEID
+	tlbre	r4, r23, PPC44x_TLB_XLAT
+	tlbre	r5, r23, PPC44x_TLB_ATTRIB
+
+	/* Save our current XLAT entry */
+	mr	r25, r4
+
+	/* Extract the TLB PageSize */
+	li	r10, 1 			/* r10 will hold PageSize */
+	rlwinm	r11, r3, 0, 24, 27	/* bits 24-27 */
+
+	/* XXX: As of now we use 256M, 4K pages */
+	cmpwi	r11, PPC44x_TLB_256M
+	bne	tlb_4k
+	rotlwi	r10, r10, 28		/* r10 = 256M */
+	b	write_out
+tlb_4k:
+	cmpwi	r11, PPC44x_TLB_4K
+	bne	default
+	rotlwi	r10, r10, 12		/* r10 = 4K */
+	b	write_out
+default:
+	rotlwi	r10, r10, 10		/* r10 = 1K */
+
+write_out:
+	/*
+	 * Write out the tmp 1:1 mapping for this code in other address space
+	 * Fixup  EPN = RPN , TS=other address space
+	 */
+	insrwi	r3, r7, 1, 23		/* Bit 23 is TS for PAGEID field */
+
+	/* Write out the tmp mapping entries */
+	tlbwe	r3, r24, PPC44x_TLB_PAGEID
+	tlbwe	r4, r24, PPC44x_TLB_XLAT
+	tlbwe	r5, r24, PPC44x_TLB_ATTRIB
+
+	subi	r11, r10, 1		/* PageOffset Mask = PageSize - 1 */
+	not	r10, r11		/* Mask for PageNum */
+
+	/* Switch to other address space in MSR */
+	insrwi	r9, r7, 1, 26		/* Set MSR[IS] = r7 */
+
+	bl	1f
+1:	mflr	r8
+	addi	r8, r8, (2f-1b)		/* Find the target offset */
+
+	/* Jump to the tmp mapping */
+	mtspr	SPRN_SRR0, r8
+	mtspr	SPRN_SRR1, r9
+	rfi
+
+2:
+	/* Invalidate the entry we were executing from */
+	li	r3, 0
+	tlbwe	r3, r23, PPC44x_TLB_PAGEID
+
+	/* attribute fields. rwx for SUPERVISOR mode */
+	li	r5, 0
+	ori	r5, r5, (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
+
+	/* Create 1:1 mapping in 256M pages */
+	xori	r7, r7, 1			/* Revert back to Original TS */
+
+	li	r8, 0				/* PageNumber */
+	li	r6, 3				/* TLB Index, start at 3  */
+
+next_tlb:
+	rotlwi	r3, r8, 28			/* Create EPN (bits 0-3) */
+	mr	r4, r3				/* RPN = EPN  */
+	ori	r3, r3, (PPC44x_TLB_VALID | PPC44x_TLB_256M) /* SIZE = 256M, Valid */
+	insrwi	r3, r7, 1, 23			/* Set TS from r7 */
+
+	tlbwe	r3, r6, PPC44x_TLB_PAGEID	/* PageID field : EPN, V, SIZE */
+	tlbwe	r4, r6, PPC44x_TLB_XLAT		/* Address translation : RPN   */
+	tlbwe	r5, r6, PPC44x_TLB_ATTRIB	/* Attributes */
+
+	addi	r8, r8, 1			/* Increment PN */
+	addi	r6, r6, 1			/* Increment TLB Index */
+	cmpwi	r8, 8				/* Are we done ? */
+	bne	next_tlb
+	isync
+
+	/* Jump to the new mapping 1:1 */
+	li	r9,0
+	insrwi	r9, r7, 1, 26			/* Set MSR[IS] = r7 */
+
+	bl	1f
+1:	mflr	r8
+	and	r8, r8, r11			/* Get our offset within page */
+	addi	r8, r8, (2f-1b)
+
+	and	r5, r25, r10			/* Get our target PageNum */
+	or	r8, r8, r5			/* Target jump address */
+
+	mtspr	SPRN_SRR0, r8
+	mtspr	SPRN_SRR1, r9
+	rfi
+2:
+	/* Invalidate the tmp entry we used */
+	li	r3, 0
+	tlbwe	r3, r24, PPC44x_TLB_PAGEID
+	sync
+
+	/* Restore the parameters */
+	mr	r3, r29
+	mr	r4, r30
+	mr	r5, r31
+
+	li	r0, 0
 #else
 	li	r0, 0
 

^ permalink raw reply related

* Re: [PATCH] powerpc/85xx: fix mpic configuration in CAMP mode
From: Fabio Baltieri @ 2011-07-12  7:38 UTC (permalink / raw)
  To: Scott Wood; +Cc: Poonam Aggrwal, linuxppc-dev, linux-kernel
In-Reply-To: <20110711143859.0e6c95d8@schlenkerla.am.freescale.net>

On Mon, Jul 11, 2011 at 9:38 PM, Scott Wood <scottwood@freescale.com> wrote=
:
> On Sun, 10 Jul 2011 20:55:32 +0200
> Fabio Baltieri <fabio.baltieri@gmail.com> wrote:
>
>> Change the string to check for CAMP mode boot on MPC85xx (eg. P2020) to =
match
>> the one in the corresponding dts files (p2020rdb_camp_core{0,1}.dts).
>>
>> Without this fix the mpic is configured as in the SMP boot mode, which c=
auses
>> the first core to report a protected source interrupt error for devices
>> of the other core and lock up.
>>
>> Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com>
>> ---
>> =A0arch/powerpc/platforms/85xx/mpc85xx_rdb.c | =A0 =A02 +-
>> =A01 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
>> b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
>> index 088f30b..a1e5e70 100644
>> --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
>> +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
>> @@ -58,7 +58,7 @@ void __init mpc85xx_rdb_pic_init(void)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> =A0 =A0 =A0 }
>>
>> - =A0 =A0 if (of_flat_dt_is_compatible(root, "fsl,85XXRDB-CAMP")) {
>> + =A0 =A0 if (of_flat_dt_is_compatible(root, "fsl,MPC85XXRDB-CAMP")) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 mpic =3D mpic_alloc(np, r.start,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MPIC_PRIMARY |
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MPIC_BIG_ENDIAN | MPIC_BROKE=
N_FRR_NIRQS,
>
> Shouldn't we be setting MPIC_SINGLE_DEST_CPU in this case (as we do for
> the other case)?
>
> Or just drop this and specify pic-no-reset in the mpic node.

Yeah, seems like a good idea. I still prefer to leave the
MPIC_WANTS_RESET for the default case (SMP mode), which I think is
what most people are using.

I'm sending a v2 of the patch, thanks!

--=20
Fabio Baltieri

^ permalink raw reply

* [PATCH v2] powerpc/85xx: fix mpic configuration in CAMP mode
From: Fabio Baltieri @ 2011-07-12  7:49 UTC (permalink / raw)
  To: Scott Wood, linuxppc-dev, linux-kernel; +Cc: Fabio Baltieri, Poonam Aggrwal
In-Reply-To: <20110711143859.0e6c95d8@schlenkerla.am.freescale.net>

Change the string to check for CAMP mode boot on MPC85xx (eg. P2020) to match
the one in the corresponding dts files (p2020rdb_camp_core{0,1}.dts).

Without this fix the mpic is configured as in the SMP boot mode, which causes
the first core to report a protected source interrupt error for devices
of the other core and lock up.

Also add MPIC_SINGLE_DEST_CPU on both P2020 based architectures in CAMP
mode as suggested by Scott Wood. Thanks.

Cc: Scott Wood <scottwood@freescale.com>
Cc: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com>
---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  |    3 ++-
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c |    5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
index c7b97f7..1b9a8cf 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
@@ -83,7 +83,8 @@ void __init mpc85xx_ds_pic_init(void)
 	if (of_flat_dt_is_compatible(root, "fsl,MPC8572DS-CAMP")) {
 		mpic = mpic_alloc(np, r.start,
 			MPIC_PRIMARY |
-			MPIC_BIG_ENDIAN | MPIC_BROKEN_FRR_NIRQS,
+			MPIC_BIG_ENDIAN | MPIC_BROKEN_FRR_NIRQS |
+			MPIC_SINGLE_DEST_CPU,
 			0, 256, " OpenPIC  ");
 	} else {
 		mpic = mpic_alloc(np, r.start,
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
index 088f30b..f5ff911 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
@@ -58,10 +58,11 @@ void __init mpc85xx_rdb_pic_init(void)
 		return;
 	}
 
-	if (of_flat_dt_is_compatible(root, "fsl,85XXRDB-CAMP")) {
+	if (of_flat_dt_is_compatible(root, "fsl,MPC85XXRDB-CAMP")) {
 		mpic = mpic_alloc(np, r.start,
 			MPIC_PRIMARY |
-			MPIC_BIG_ENDIAN | MPIC_BROKEN_FRR_NIRQS,
+			MPIC_BIG_ENDIAN | MPIC_BROKEN_FRR_NIRQS |
+			MPIC_SINGLE_DEST_CPU,
 			0, 256, " OpenPIC  ");
 	} else {
 		mpic = mpic_alloc(np, r.start,
-- 
1.7.5.1

^ permalink raw reply related

* softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)
From: Thomas De Schampheleire @ 2011-07-12  9:23 UTC (permalink / raw)
  To: Eric Dumazet, linuxppc-dev; +Cc: netdev, Ronny Meeus, David Miller

Hi,

I'm adding the linuxppc-dev mailing list since this may be pointing to
an irq/softirq problem in the powerpc architecture-specific code...

On Tue, Jul 12, 2011 at 8:58 AM, Eric Dumazet <eric.dumazet@gmail.com> wrot=
e:
> Le mardi 12 juillet 2011 =E0 08:38 +0200, Ronny Meeus a =E9crit :
>> On Tue, Jul 12, 2011 at 5:27 AM, David Miller <davem@davemloft.net> wrot=
e:
>> > From: Ronny Meeus <ronny.meeus@gmail.com>
>> > Date: Sat, 11 Jun 2011 07:04:09 +0200
>> >
>> >> I was running a test: 1 application was sending raw Ethernet packets
>> >> on a physical looped interface while a second application was
>> >> receiving packets, so the latter application receives each packet 2
>> >> times (once while sending from the context of the first application
>> >> and a second time while receiving from the hardware). =A0After some
>> >> time, the test blocks due to a spinlock reentrance issue in
>> >> af_packet. Both the sending application and the softIRQ receiving
>> >> packets enter the spinlock code. After applying the patch below, the
>> >> issue is resolved.
>> >>
>> >> Signed-off-by: Ronny Meeus <ronny.meeus@gmail.com>
>> >
>> > The packet receive hooks should always be called with software
>> > interrupts disabled, it is a bug if this is not happening. =A0Your
>> > patch should not be necessary at all.
>> >
>> >
>>
>> Can you be a bit more specific on where the software interrupts should
>> be disabled?
>>
>> Below you find the information I get after switching on the spin_lock
>> issue detection in the kernel.
>> In this run also I-PIPE was active but this issue is also seen with
>> I-PIPE disabled.
>>
>
> This seems a bug, but in softirq handling in your arch
>
>> [ =A0 96.450034] BUG: spinlock recursion on CPU#0, send_eth_socket/1907
>> [ =A0 96.540451] =A0lock: eaeb8c9c, .magic: dead4ead, .owner:
>> send_eth_socket/1907, .owner_cpu: 0
>> [ =A0 96.656060] Call Trace: (unreliable)0161000 success=3D90] [c000789c=
]
>> show_stack+0x78/0x18c160001
>> [ =A0 96.827988] [efff3dd0] [c01e2a50] spin_bug+0xa8/0xc0=3D0000162000 s=
ucc
>> [ =A0 96.920712] [efff3df0] [c01e2b9c] do_raw_spin_lock+0x70/0x1c4ount=
=3D0000163000
>> [ =A0 97.022823] [efff3e20] [c0388d88] _raw_spin_lock+0x10/0x2001
>> [ =A0 97.121800] [efff3e30] [c03391a0] tpacket_rcv+0x274/0x61c164001
>> [ =A0 97.219733] [efff3e80] [c02d5970] __netif_receive_skb+0x1a8/0x36c00=
00165001
>> [ =A0 97.326001] [efff3eb0] [c02d6234] netif_receive_skb+0x98/0xacess=3D=
0000166001
>> [ =A0 97.427060] [efff3ee0] [c029e6c4] ingress_rx_default_dqrr+0x42c/0x4=
b8
>> [ =A0 97.504194] [efff3f10] [c02ba524] qman_poll_dqrr+0x1e0/0x284
>> [ =A0 97.571948] [efff3f50] [c029ff3c] dpaa_eth_poll+0x34/0xd0
>> [ =A0 97.636579] [efff3f70] [c02d6514] net_rx_action+0xc0/0x1e8
>> [ =A0 97.702256] [efff3fa0] [c0034d28] __do_softirq+0x138/0x210
>> [ =A0 97.767928] [efff3ff0] [c0010128] call_do_softirq+0x14/0x24
>> [ =A0 97.834641] [ec479a90] [c0004a00] do_softirq+0xb4/0xec
>
> We got an IRQ, and we start do_softirq() from irq_exit()
>
>> [ =A0 97.896148] [ec479ab0] [c0034a44] irq_exit+0x60/0xb8
>> [ =A0 97.955573] [ec479ac0] [c000a490] __ipipe_do_IRQ+0x88/0xc0
>> [ =A0 98.021253] [ec479ae0] [c0070214] __ipipe_sync_stage+0x1f0/0x27c
>> [ =A0 98.093176] [ec479b20] [c0009f28] __ipipe_handle_irq+0x1b8/0x1e8
>> [ =A0 98.165106] [ec479b50] [c000a210] __ipipe_grab_irq+0x18c/0x1bc
>> [ =A0 98.234947] [ec479b80] [c0011520] __ipipe_ret_from_except+0x0/0xc
>> [ =A0 98.307915] --- Exception: 501 at __packet_get_status+0x48/0x70
>> [ =A0 98.307920] =A0 =A0 LR =3D __packet_get_status+0x44/0x70
>> [ =A0 98.436082] [ec479c40] [00000578] 0x578 (unreliable)
>> [ =A0 98.495524] [ec479c50] [c0338360] packet_lookup_frame+0x48/0x70
>> [ =A0 98.566405] [ec479c60] [c03391b4] tpacket_rcv+0x288/0x61c
>
> Here we were in BH disabled section, since dev_queue_xmit() contains :
>
> =A0 =A0 =A0 =A0rcu_read_lock_bh()
>
>
>> [ =A0 98.631037] [ec479cb0] [c02d762c] dev_hard_start_xmit+0x164/0x588
>> [ =A0 98.704003] [ec479cf0] [c0338d6c] packet_sendmsg+0x8c4/0x988
>> [ =A0 98.771758] [ec479d70] [c02c3838] sock_sendmsg+0x90/0xb4
>> [ =A0 98.835348] [ec479e40] [c02c4420] sys_sendto+0xdc/0x120
>> [ =A0 98.897891] [ec479f10] [c02c57d0] sys_socketcall+0x148/0x210
>> [ =A0 98.965648] [ec479f40] [c001084c] ret_from_syscall+0x0/0x3c
>> [ =A0 99.032361] --- Exception: c01 at 0x48051f00
>> [ =A0 99.032365] =A0 =A0 LR =3D 0x4808e030
>> [ =A0100.563009] BUG: spinlock lockup on CPU#0, send_eth_socket/1907, ea=
eb8c9c
>> [ =A0100.644283] Call Trace:
>> [ =A0100.673480] [efff3db0] [c000789c] show_stack+0x78/0x18c (unreliable=
)
>> [ =A0100.749589] [efff3df0] [c01e2c94] do_raw_spin_lock+0x168/0x1c4
>> [ =A0100.819430] [efff3e20] [c0388d88] _raw_spin_lock+0x10/0x20
>> [ =A0100.885102] [efff3e30] [c03391a0] tpacket_rcv+0x274/0x61c
>> [ =A0100.949733] [efff3e80] [c02d5970] __netif_receive_skb+0x1a8/0x36c
>> [ =A0101.022699] [efff3eb0] [c02d6234] netif_receive_skb+0x98/0xac
>> [ =A0101.091497] [efff3ee0] [c029e6c4] ingress_rx_default_dqrr+0x42c/0x4=
b8
>> [ =A0101.168628] [efff3f10] [c02ba524] qman_poll_dqrr+0x1e0/0x284
>> [ =A0101.236385] [efff3f50] [c029ff3c] dpaa_eth_poll+0x34/0xd0
>> [ =A0101.301016] [efff3f70] [c02d6514] net_rx_action+0xc0/0x1e8
>> [ =A0101.366692] [efff3fa0] [c0034d28] __do_softirq+0x138/0x210
>> [ =A0101.432364] [efff3ff0] [c0010128] call_do_softirq+0x14/0x24
>> [ =A0101.499078] [ec479a90] [c0004a00] do_softirq+0xb4/0xec
>> [ =A0101.560585] [ec479ab0] [c0034a44] irq_exit+0x60/0xb8
>> [ =A0101.620003] [ec479ac0] [c000a490] __ipipe_do_IRQ+0x88/0xc0
>> [ =A0101.685678] [ec479ae0] [c0070214] __ipipe_sync_stage+0x1f0/0x27c
>> [ =A0101.757601] [ec479b20] [c0009f28] __ipipe_handle_irq+0x1b8/0x1e8
>> [ =A0101.829523] [ec479b50] [c000a210] __ipipe_grab_irq+0x18c/0x1bc
>> [ =A0101.899363] [ec479b80] [c0011520] __ipipe_ret_from_except+0x0/0xc
>> [ =A0101.972333] --- Exception: 501 at __packet_get_status+0x48/0x70
>> [ =A0101.972338] =A0 =A0 LR =3D __packet_get_status+0x44/0x70
>> [ =A0102.100490] [ec479c40] [00000578] 0x578 (unreliable)
>> [ =A0102.159929] [ec479c50] [c0338360] packet_lookup_frame+0x48/0x70
>> [ =A0102.230810] [ec479c60] [c03391b4] tpacket_rcv+0x288/0x61c
>> [ =A0102.295442] [ec479cb0] [c02d762c] dev_hard_start_xmit+0x164/0x588
>> [ =A0102.368407] [ec479cf0] [c0338d6c] packet_sendmsg+0x8c4/0x988
>> [ =A0102.436162] [ec479d70] [c02c3838] sock_sendmsg+0x90/0xb4
>> [ =A0102.499747] [ec479e40] [c02c4420] sys_sendto+0xdc/0x120
>> [ =A0102.562296] [ec479f10] [c02c57d0] sys_socketcall+0x148/0x210
>> [ =A0102.630052] [ec479f40] [c001084c] ret_from_syscall+0x0/0x3c
>> [ =A0102.696773] --- Exception: c01 at 0x48051f00
>> [ =A0102.696777] =A0 =A0 LR =3D 0x4808e030
>>


Note that the reason we are seeing this problem, may be because the
kernel we are using contains some patches from Freescale.
Specifically, in dev_queue_xmit(), support is added for hardware queue
handling, just before entering the rcu_read_lock_bh():

        if (dev->features & NETIF_F_HW_QDISC) {
                txq =3D dev_pick_tx(dev, skb);
                return dev_hard_start_xmit(skb, dev, txq);
        }

        /* Disable soft irqs for various locks below. Also
         * stops preemption for RCU.
         */
        rcu_read_lock_bh();

We just tried moving the escaping to dev_hard_start_xmit() after
taking the lock, but this gives a large number of other problems, e.g.

[   78.662428] BUG: sleeping function called from invalid context at
mm/slab.c:3101
[   78.751004] in_atomic(): 1, irqs_disabled(): 0, pid: 1908, name:
send_eth_socket
[   78.839582] Call Trace:
[   78.868784] [ec537b70] [c000789c] show_stack+0x78/0x18c (unreliable)
[   78.944905] [ec537bb0] [c0022900] __might_sleep+0x100/0x118
[   79.011636] [ec537bc0] [c00facc4] kmem_cache_alloc+0x48/0x118
[   79.080446] [ec537be0] [c02cd0e8] __alloc_skb+0x50/0x130
[   79.144047] [ec537c00] [c02cdf5c] skb_copy+0x44/0xc8
[   79.203478] [ec537c20] [c029f904] dpa_tx+0x154/0x758
[   79.262907] [ec537c80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
[   79.335878] [ec537cc0] [c02d7aac] dev_queue_xmit+0x5c/0x3a4
[   79.402602] [ec537cf0] [c0338d4c] packet_sendmsg+0x8c4/0x988
[   79.470363] [ec537d70] [c02c3838] sock_sendmsg+0x90/0xb4
[   79.533960] [ec537e40] [c02c4420] sys_sendto+0xdc/0x120
[   79.596514] [ec537f10] [c02c57d0] sys_socketcall+0x148/0x210
[   79.664287] [ec537f40] [c001084c] ret_from_syscall+0x0/0x3c
[   79.731015] --- Exception: c01 at 0x48051f00
[   79.731019]     LR =3D 0x4808e030


Note that this may just be the cause for us seeing this problem. If
indeed the main problem is irq_exit() invoking softirqs in a locked
context, then this patch adding hardware queue support is not really
relevant.

Any suggestions from the developers at linuxppc-dev are very welcome...

Thomas

^ permalink raw reply

* [PATCH] [v3] kexec-tools: ppc32: Fixup ThreadPointer for purgatory code
From: Suzuki K. Poulose @ 2011-07-12  9:50 UTC (permalink / raw)
  To: Simon Horman
  Cc: Suzuki K. Poulose, kexec, Josh Boyer, Paul Mackerras,
	linux ppc dev, Vivek Goyal

PPC32 ELF ABI expects r2 to be loaded with Thread Pointer, which is 0x7000
bytes past the end of TCB. Though the purgatory is single threaded, it uses
TCB scratch space in vsnprintf(). This patch allocates a 1024byte TCB
and populates the TP with the address accordingly.

Changes from V2: Avoid address overflow in TP allocation.
Changes from V1: Fixed the addr calculation for uImage support.


Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Ryan S. Arnold <rsa@us.ibm.com>
---

 kexec/arch/ppc/kexec-elf-ppc.c     |   19 +++++++++++++++++++
 kexec/arch/ppc/kexec-uImage-ppc.c  |   17 +++++++++++++++++
 purgatory/arch/ppc/purgatory-ppc.c |    2 +-
 purgatory/arch/ppc/v2wrap_32.S     |    4 ++++
 4 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/kexec/arch/ppc/kexec-elf-ppc.c b/kexec/arch/ppc/kexec-elf-ppc.c
index f4443b4..ecbbbeb 100644
--- a/kexec/arch/ppc/kexec-elf-ppc.c
+++ b/kexec/arch/ppc/kexec-elf-ppc.c
@@ -414,6 +414,25 @@ int elf_ppc_load(int argc, char **argv,	const char *buf, off_t len,
 	elf_rel_set_symbol(&info->rhdr, "stack", &addr, sizeof(addr));
 #undef PUL_STACK_SIZE
 
+	/*
+	 * Fixup ThreadPointer(r2) for purgatory.
+	 * PPC32 ELF ABI expects : 
+	 * ThreadPointer (TP) = TCB + 0x7000
+	 * We manually allocate a TCB space and set the TP
+	 * accordingly.
+	 */
+#define TCB_SIZE 1024
+#define TCB_TP_OFFSET 0x7000	/* PPC32 ELF ABI */
+
+	addr = locate_hole(info, TCB_SIZE, 0, 0,
+				((unsigned long)elf_max_addr(&ehdr) - TCB_TP_OFFSET),
+				1);
+	addr += TCB_SIZE + TCB_TP_OFFSET;
+	elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+
+#undef TCB_SIZE
+#undef TCB_TP_OFFSET
+
 	addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
 	info->entry = (void *)addr;
 #endif
diff --git a/kexec/arch/ppc/kexec-uImage-ppc.c b/kexec/arch/ppc/kexec-uImage-ppc.c
index 1d71374..216c82d 100644
--- a/kexec/arch/ppc/kexec-uImage-ppc.c
+++ b/kexec/arch/ppc/kexec-uImage-ppc.c
@@ -228,6 +228,23 @@ static int ppc_load_bare_bits(int argc, char **argv, const char *buf,
 	/* No allocation past here in order not to overwrite the stack */
 #undef PUL_STACK_SIZE
 
+	/*
+	 * Fixup ThreadPointer(r2) for purgatory.
+	 * PPC32 ELF ABI expects : 
+	 * ThreadPointer (TP) = TCB + 0x7000
+	 * We manually allocate a TCB space and set the TP
+	 * accordingly.
+	 */
+#define TCB_SIZE 	1024
+#define TCB_TP_OFFSET 	0x7000	/* PPC32 ELF ABI */
+	addr = locate_hole(info, TCB_SIZE, 0, 0,
+				((unsigned long)-1 - TCB_TP_OFFSET),
+				1);
+	addr += TCB_SIZE + TCB_TP_OFFSET;
+	elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+#undef TCB_TP_OFFSET
+#undef TCB_SIZE
+
 	addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
 	info->entry = (void *)addr;
 
diff --git a/purgatory/arch/ppc/purgatory-ppc.c b/purgatory/arch/ppc/purgatory-ppc.c
index 349e750..3e6b354 100644
--- a/purgatory/arch/ppc/purgatory-ppc.c
+++ b/purgatory/arch/ppc/purgatory-ppc.c
@@ -26,7 +26,7 @@ unsigned int panic_kernel = 0;
 unsigned long backup_start = 0;
 unsigned long stack = 0;
 unsigned long dt_offset = 0;
-unsigned long my_toc = 0;
+unsigned long my_thread_ptr = 0;
 unsigned long kernel = 0;
 
 void setup_arch(void)
diff --git a/purgatory/arch/ppc/v2wrap_32.S b/purgatory/arch/ppc/v2wrap_32.S
index 8442d16..8b60677 100644
--- a/purgatory/arch/ppc/v2wrap_32.S
+++ b/purgatory/arch/ppc/v2wrap_32.S
@@ -56,6 +56,10 @@ master:
 	mr      17,3            # save cpu id to r17
 	mr      15,4            # save physical address in reg15
 
+	lis	6,my_thread_ptr@h
+	ori	6,6,my_thread_ptr@l
+	lwz	2,0(6)		# setup ThreadPointer(TP)
+
 	lis	6,stack@h
 	ori	6,6,stack@l
 	lwz     1,0(6)          #setup stack

^ permalink raw reply related

* Re: softirqs are invoked while bottom halves are masked
From: David Miller @ 2011-07-12 10:01 UTC (permalink / raw)
  To: patrickdepinguin+linuxppc; +Cc: linuxppc-dev, ronny.meeus, eric.dumazet, netdev
In-Reply-To: <CAAXf6LXHeiCT+n=Wpdf=QfUXUHUSd7fmx9y3kB7te8N3ON1fTg@mail.gmail.com>

From: Thomas De Schampheleire <patrickdepinguin+linuxppc@gmail.com>
Date: Tue, 12 Jul 2011 11:23:28 +0200

> Note that the reason we are seeing this problem, may be because the
> kernel we are using contains some patches from Freescale.
> Specifically, in dev_queue_xmit(), support is added for hardware queue
> handling, just before entering the rcu_read_lock_bh():
> 
>         if (dev->features & NETIF_F_HW_QDISC) {
>                 txq = dev_pick_tx(dev, skb);
>                 return dev_hard_start_xmit(skb, dev, txq);
>         }
> 
>         /* Disable soft irqs for various locks below. Also
>          * stops preemption for RCU.
>          */
>         rcu_read_lock_bh();
> 
> We just tried moving the escaping to dev_hard_start_xmit() after
> taking the lock, but this gives a large number of other problems, e.g.

This is definitely why you are seeing this behavior.

You cannot invoke dev_hard_start_xmit() without softirqs
being disabled.  It breaks everything.

This is what happens when you integrate networking patches
which were not reviewed and vetted on netdev.

^ permalink raw reply

* Re: softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)
From: Eric Dumazet @ 2011-07-12 10:10 UTC (permalink / raw)
  To: Thomas De Schampheleire; +Cc: linuxppc-dev, netdev, Ronny Meeus, David Miller
In-Reply-To: <CAAXf6LXHeiCT+n=Wpdf=QfUXUHUSd7fmx9y3kB7te8N3ON1fTg@mail.gmail.com>

Le mardi 12 juillet 2011 à 11:23 +0200, Thomas De Schampheleire a
écrit :
> Hi,
> 
> I'm adding the linuxppc-dev mailing list since this may be pointing to
> an irq/softirq problem in the powerpc architecture-specific code...

> 
> Note that the reason we are seeing this problem, may be because the
> kernel we are using contains some patches from Freescale.
> Specifically, in dev_queue_xmit(), support is added for hardware queue
> handling, just before entering the rcu_read_lock_bh():
> 

Oh well, what a mess.

>         if (dev->features & NETIF_F_HW_QDISC) {
>                 txq = dev_pick_tx(dev, skb);



>                 return dev_hard_start_xmit(skb, dev, txq);
	This need to be :
		local_bh_disable();
		rc = dev_hard_start_xmit(skb, dev, txq);
		local_bh_enable();
		return rc;


>         }
> 
>         /* Disable soft irqs for various locks below. Also
>          * stops preemption for RCU.
>          */
>         rcu_read_lock_bh();
> 
> We just tried moving the escaping to dev_hard_start_xmit() after
> taking the lock, but this gives a large number of other problems, e.g.
> 
> [   78.662428] BUG: sleeping function called from invalid context at
> mm/slab.c:3101
> [   78.751004] in_atomic(): 1, irqs_disabled(): 0, pid: 1908, name:
> send_eth_socket
> [   78.839582] Call Trace:
> [   78.868784] [ec537b70] [c000789c] show_stack+0x78/0x18c (unreliable)
> [   78.944905] [ec537bb0] [c0022900] __might_sleep+0x100/0x118
> [   79.011636] [ec537bc0] [c00facc4] kmem_cache_alloc+0x48/0x118
> [   79.080446] [ec537be0] [c02cd0e8] __alloc_skb+0x50/0x130
> [   79.144047] [ec537c00] [c02cdf5c] skb_copy+0x44/0xc8
> [   79.203478] [ec537c20] [c029f904] dpa_tx+0x154/0x758

doing GFP_KERNEL allocations in dpa_tx() is wrong, for sure.


> [   79.262907] [ec537c80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
> [   79.335878] [ec537cc0] [c02d7aac] dev_queue_xmit+0x5c/0x3a4
> [   79.402602] [ec537cf0] [c0338d4c] packet_sendmsg+0x8c4/0x988
> [   79.470363] [ec537d70] [c02c3838] sock_sendmsg+0x90/0xb4
> [   79.533960] [ec537e40] [c02c4420] sys_sendto+0xdc/0x120
> [   79.596514] [ec537f10] [c02c57d0] sys_socketcall+0x148/0x210
> [   79.664287] [ec537f40] [c001084c] ret_from_syscall+0x0/0x3c
> [   79.731015] --- Exception: c01 at 0x48051f00
> [   79.731019]     LR = 0x4808e030
> 
> 
> Note that this may just be the cause for us seeing this problem. If
> indeed the main problem is irq_exit() invoking softirqs in a locked
> context, then this patch adding hardware queue support is not really
> relevant.

irq_exit() is fine. This is because BH are not masked because of the
Freescale patches.

Really, suggesting an af_packet patch to solve a problem introduced in
an out of tree patch is insane.

You guys hould have clearly stated you were using an alien kernel.

^ permalink raw reply

* Re: [RFC PATCH 7/7] powerpc: Support RELOCATABLE kernel for PPC44x
From: Suzuki Poulose @ 2011-07-12 11:09 UTC (permalink / raw)
  To: Michal Simek; +Cc: tmarri, linuxppc-dev, john.williams, arnd
In-Reply-To: <1308233668-24166-8-git-send-email-monstr@monstr.eu>

On 06/16/11 19:44, Michal Simek wrote:
> Changes:
> - Find out address where kernel runs
> - Create the first 256MB TLB from online detected address
>
> Limitations:
> - Kernel must be aligned to 256MB
>
> Backport:
> - Changes in page.h are backported from newer kernel version
>
> mmu_mapin_ram function has to reflect offset in memory start.
> memstart_addr and kernstart_addr are setup directly from asm
> code to ensure that only ppc44x is affected.
>
> Signed-off-by: Michal Simek<monstr@monstr.eu>
> ---
>   arch/powerpc/Kconfig            |    3 ++-
>   arch/powerpc/include/asm/page.h |    7 ++++++-
>   arch/powerpc/kernel/head_44x.S  |   28 ++++++++++++++++++++++++++++
>   arch/powerpc/mm/44x_mmu.c       |    6 +++++-
>   4 files changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 45c9683..34c521e 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -796,7 +796,8 @@ config LOWMEM_CAM_NUM
>
>   config RELOCATABLE
>   	bool "Build a relocatable kernel (EXPERIMENTAL)"
> -	depends on EXPERIMENTAL&&  ADVANCED_OPTIONS&&  FLATMEM&&  FSL_BOOKE
> +	depends on EXPERIMENTAL&&  ADVANCED_OPTIONS&&  FLATMEM
> +	depends on FSL_BOOKE || (44x&&  !SMP)
>   	help
>   	  This builds a kernel image that is capable of running at the
>   	  location the kernel is loaded at (some alignment restrictions may
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 4940662..e813cc2 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -108,8 +108,13 @@ extern phys_addr_t kernstart_addr;
>   #define pfn_to_kaddr(pfn)	__va((pfn)<<  PAGE_SHIFT)
>   #define virt_addr_valid(kaddr)	pfn_valid(__pa(kaddr)>>  PAGE_SHIFT)
>
> -#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET - MEMORY_START))
> +#ifdef CONFIG_BOOKE
> +#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + KERNELBASE))
> +#define __pa(x) ((unsigned long)(x) + PHYSICAL_START - KERNELBASE)
> +#else
> +#define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + PAGE_OFFSET - MEMORY_START))
>   #define __pa(x) ((unsigned long)(x) - PAGE_OFFSET + MEMORY_START)
> +#endif
>
>   /*
>    * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI,
> diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
> index d80ce05..6a63d32 100644
> --- a/arch/powerpc/kernel/head_44x.S
> +++ b/arch/powerpc/kernel/head_44x.S
> @@ -59,6 +59,17 @@ _ENTRY(_start);
>   	 * of abatron_pteptrs
>   	 */
>   	nop
> +
> +#ifdef CONFIG_RELOCATABLE
> +	bl	jump                            /* Find our address */
> +	nop
> +jump:	mflr	r25                              /* Make it accessible */
> +	/* just for and */
> +	lis     r26, 0xfffffff0@h
> +	ori     r26, r26, 0xfffffff0@l
> +	and.	r21, r25, r26
> +#endif

Hmm. So we are assuming we are running from a 1:1 mapping at the entry.
It is much more safe to read our tlb entry and use the RPN instead.


> +#ifdef CONFIG_RELOCATABLE
> +	/* load physical address where kernel runs */
> +	mr	r4,r21
> +#else
>   	/* Kernel is at PHYSICAL_START */
>   	lis	r4,PHYSICAL_START@h
>   	ori	r4,r4,PHYSICAL_START@l
> +#endif
>
>   	/* Load the kernel PID = 0 */
>   	li	r0,0
> @@ -258,6 +274,18 @@ skpinv:	addi	r4,r4,1				/* Increment */
>   	mr	r5,r29
>   	mr	r6,r28
>   	mr	r7,r27
> +
> +#ifdef CONFIG_RELOCATABLE
> +	/* save kernel and memory start */
> +	lis	r25,kernstart_addr@h
> +	ori	r25,r25,kernstart_addr@l
> +	stw	r21,4(r25)

1) You have to use ERPN value in the higher word of kernel_start_addr.
2) You have to account for the (KERNEL_BASE - PAGE_OFFSET) shift for kernel_start_addr.

> +
> +	lis	r25,memstart_addr@h
> +	ori	r25,r25,memstart_addr@l
> +	stw	r21,4(r25)

> +#endif
> +

Suzuki

^ permalink raw reply

* [PATCH] powerpc/44x: Use correct phy-address dt nodes on taishan.dts
From: Stefan Roese @ 2011-07-12 11:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Josh Boyer

Taishan (440GX) has the first PHY (EMAC2) mapped at PHY address 1
and the 2nd PHY (EMAC3) at PHY address 3. Use "phy-address" to
correctly describe this instead of "phy-map".

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
---
 arch/powerpc/boot/dts/taishan.dts |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/taishan.dts b/arch/powerpc/boot/dts/taishan.dts
index 058438f..1657ad0 100644
--- a/arch/powerpc/boot/dts/taishan.dts
+++ b/arch/powerpc/boot/dts/taishan.dts
@@ -337,7 +337,7 @@
 				rx-fifo-size = <4096>;
 				tx-fifo-size = <2048>;
 				phy-mode = "rgmii";
-				phy-map = <0x00000001>;
+				phy-address = <1>;
 				rgmii-device = <&RGMII0>;
 				rgmii-channel = <0>;
  				zmii-device = <&ZMII0>;
@@ -361,7 +361,7 @@
 				rx-fifo-size = <4096>;
 				tx-fifo-size = <2048>;
 				phy-mode = "rgmii";
-				phy-map = <0x00000003>;
+				phy-address = <3>;
 				rgmii-device = <&RGMII0>;
 				rgmii-channel = <1>;
  				zmii-device = <&ZMII0>;
-- 
1.7.6

^ permalink raw reply related

* Re: softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)
From: Ronny Meeus @ 2011-07-12 12:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linuxppc-dev, netdev, afleming, Thomas De Schampheleire,
	David Miller
In-Reply-To: <1310465411.3314.6.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 12, 2011 at 12:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wro=
te:
> Le mardi 12 juillet 2011 =E0 11:23 +0200, Thomas De Schampheleire a
> =E9crit :
>> Hi,
>>
>> I'm adding the linuxppc-dev mailing list since this may be pointing to
>> an irq/softirq problem in the powerpc architecture-specific code...
>
>>
>> Note that the reason we are seeing this problem, may be because the
>> kernel we are using contains some patches from Freescale.
>> Specifically, in dev_queue_xmit(), support is added for hardware queue
>> handling, just before entering the rcu_read_lock_bh():
>>
>
> Oh well, what a mess.
>
>> =A0 =A0 =A0 =A0 if (dev->features & NETIF_F_HW_QDISC) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txq =3D dev_pick_tx(dev, skb);
>
>
>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return dev_hard_start_xmit(skb, dev, txq=
);
> =A0 =A0 =A0 =A0This need to be :
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0local_bh_disable();
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rc =3D dev_hard_start_xmit(skb, dev, txq);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0local_bh_enable();
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return rc;
>
>
>> =A0 =A0 =A0 =A0 }
>>
>> =A0 =A0 =A0 =A0 /* Disable soft irqs for various locks below. Also
>> =A0 =A0 =A0 =A0 =A0* stops preemption for RCU.
>> =A0 =A0 =A0 =A0 =A0*/
>> =A0 =A0 =A0 =A0 rcu_read_lock_bh();
>>
>> We just tried moving the escaping to dev_hard_start_xmit() after
>> taking the lock, but this gives a large number of other problems, e.g.
>>
>> [ =A0 78.662428] BUG: sleeping function called from invalid context at
>> mm/slab.c:3101
>> [ =A0 78.751004] in_atomic(): 1, irqs_disabled(): 0, pid: 1908, name:
>> send_eth_socket
>> [ =A0 78.839582] Call Trace:
>> [ =A0 78.868784] [ec537b70] [c000789c] show_stack+0x78/0x18c (unreliable=
)
>> [ =A0 78.944905] [ec537bb0] [c0022900] __might_sleep+0x100/0x118
>> [ =A0 79.011636] [ec537bc0] [c00facc4] kmem_cache_alloc+0x48/0x118
>> [ =A0 79.080446] [ec537be0] [c02cd0e8] __alloc_skb+0x50/0x130
>> [ =A0 79.144047] [ec537c00] [c02cdf5c] skb_copy+0x44/0xc8
>> [ =A0 79.203478] [ec537c20] [c029f904] dpa_tx+0x154/0x758
>
> doing GFP_KERNEL allocations in dpa_tx() is wrong, for sure.
>
>
>> [ =A0 79.262907] [ec537c80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
>> [ =A0 79.335878] [ec537cc0] [c02d7aac] dev_queue_xmit+0x5c/0x3a4
>> [ =A0 79.402602] [ec537cf0] [c0338d4c] packet_sendmsg+0x8c4/0x988
>> [ =A0 79.470363] [ec537d70] [c02c3838] sock_sendmsg+0x90/0xb4
>> [ =A0 79.533960] [ec537e40] [c02c4420] sys_sendto+0xdc/0x120
>> [ =A0 79.596514] [ec537f10] [c02c57d0] sys_socketcall+0x148/0x210
>> [ =A0 79.664287] [ec537f40] [c001084c] ret_from_syscall+0x0/0x3c
>> [ =A0 79.731015] --- Exception: c01 at 0x48051f00
>> [ =A0 79.731019] =A0 =A0 LR =3D 0x4808e030
>>
>>
>> Note that this may just be the cause for us seeing this problem. If
>> indeed the main problem is irq_exit() invoking softirqs in a locked
>> context, then this patch adding hardware queue support is not really
>> relevant.
>
> irq_exit() is fine. This is because BH are not masked because of the
> Freescale patches.
>
> Really, suggesting an af_packet patch to solve a problem introduced in
> an out of tree patch is insane.
>
> You guys hould have clearly stated you were using an alien kernel.
>
>
>
>

Sorry for not mentioning we were using a patched kernel.
I was not aware that the code involved was patched by the FreeScale
patches we applied. The code found in the stack dumps is not
implemented in FSL specific files.

While reading the code of af_packet I saw that the spin_lock_bh is
used in several places while this is not the case in the tpacket_rcv
function. Since we had a locking issue in that code, I thought that my
patch would be OK.
I was not aware that for that specific function (tpacket_rcv) a
different lock primitive must be used. A suggestion for improvement:
it would be better to document this pre-condition in the code.

After doing the change you proposed our code now looks like:

>---if (dev->features & NETIF_F_HW_QDISC) {
>--->---txq =3D dev_pick_tx(dev, skb);
>--->---local_bh_disable();
>--->---rc =3D dev_hard_start_xmit(skb, dev, txq);
>--->---local_bh_enable();
>--->---return rc;
>---}

>---/* Disable soft irqs for various locks below. Also
>--- * stops preemption for RCU.
>--- */
>---rcu_read_lock_bh();

but we still see the issue "BUG: sleeping function called from invalid cont=
ext":

[   91.015989] BUG: sleeping function called from invalid context at
include/linux/skbuff.h:786
[   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1=
842
[   91.200461] Call Trace:
[   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
[   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
[   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758
[   91.431957] [ec58bc80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
[   91.504952] [ec58bcc0] [c02d7ab0] dev_queue_xmit+0x60/0x3ac
[   91.571692] [ec58bcf0] [c0338d54] packet_sendmsg+0x8c4/0x988
[   91.639457] [ec58bd70] [c02c3838] sock_sendmsg+0x90/0xb4
[   91.703066] [ec58be40] [c02c4420] sys_sendto+0xdc/0x120
[   91.765646] [ec58bf10] [c02c57d0] sys_socketcall+0x148/0x210
[   91.833420] [ec58bf40] [c001084c] ret_from_syscall+0x0/0x3c
[   91.900153] --- Exception: c01 at 0x4824df00
[   91.900157]     LR =3D 0x4828a030


The FreeScale patch that introduced this code was created by Andy
Fleming <afleming@freescale.com> (Put in CC).

The purpose of the patch is:
"
Subject: [PATCH] net: Add support for handling queueing in hardware

The QDisc code does a bunch of locking which is unnecessary if
you have hardware which handles all of the queueing.  Add
support for this, and skip over all of the queueing code if
the feature is enabled on a given device.
"

Ronny

^ permalink raw reply

* Re: softirqs are invoked while bottom halves are masked
From: David Miller @ 2011-07-12 12:08 UTC (permalink / raw)
  To: ronny.meeus
  Cc: linuxppc-dev, afleming, patrickdepinguin+linuxppc, eric.dumazet,
	netdev
In-Reply-To: <CAMJ=MEeC1hoqufs7AfFRn3yJoC8mdw7v+14N+7e=wQuJefm4_w@mail.gmail.com>

From: Ronny Meeus <ronny.meeus@gmail.com>
Date: Tue, 12 Jul 2011 14:03:04 +0200

> but we still see the issue "BUG: sleeping function called from invalid context":
> 
> [   91.015989] BUG: sleeping function called from invalid context at
> include/linux/skbuff.h:786
> [   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1842
> [   91.200461] Call Trace:
> [   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
> [   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
> [   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758

Because this dpa driver's transmit method is doing something else that
is not allowed in software interrupt context.

You must remove all things that might sleep in this driver's
->ndo_start_xmit method, and I do mean everything.

^ permalink raw reply

* Re: softirqs are invoked while bottom halves are masked
From: David Miller @ 2011-07-12 12:13 UTC (permalink / raw)
  To: ronny.meeus
  Cc: linuxppc-dev, afleming, patrickdepinguin+linuxppc, eric.dumazet,
	netdev
In-Reply-To: <20110712.050817.1253941735409335652.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 12 Jul 2011 05:08:17 -0700 (PDT)

> From: Ronny Meeus <ronny.meeus@gmail.com>
> Date: Tue, 12 Jul 2011 14:03:04 +0200
> 
>> but we still see the issue "BUG: sleeping function called from invalid context":
>> 
>> [   91.015989] BUG: sleeping function called from invalid context at
>> include/linux/skbuff.h:786
>> [   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1842
>> [   91.200461] Call Trace:
>> [   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
>> [   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
>> [   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758
> 
> Because this dpa driver's transmit method is doing something else that
> is not allowed in software interrupt context.
> 
> You must remove all things that might sleep in this driver's
> ->ndo_start_xmit method, and I do mean everything.

Also this whole HW QOS feature bit facility is beyond bogus.

What if the user enables a qdisc that the hardware can't handle, or a
configuration of a hw supported qdisc that the hardware can't support?

What if I have packet classification and packet actions enabled in the
packet scheduler?

These changes are terrible, and we really need you guys to sort out
your problems with these changes yoursleves because your wounds are
entirely self-inflicted and totally not our problem.

^ permalink raw reply

* Re: [PATCH] mm: Fix output of total_ram.
From: Josh Boyer @ 2011-07-12 12:35 UTC (permalink / raw)
  To: LinuxPPC-dev, Benjamin Herrenschmidt
In-Reply-To: <20110705044419.GA20597@ozlabs.org>

On Tue, Jul 5, 2011 at 12:44 AM, Tony Breeds <tony@bakeyournoodle.com> wrot=
e:
> On 32bit platforms that support >=3D 4GB memory total_ram was truncated.
> This creates a confusing printk:
> =A0 =A0 =A0 =A0Top of RAM: 0x100000000, Total RAM: 0x0
> Fix that:
> =A0 =A0 =A0 =A0Top of RAM: 0x100000000, Total RAM: 0x100000000
>
> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>

Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>

^ permalink raw reply

* Re: [PATCH 1/3] powerpc/47x: allow kernel to be loaded in higher physical memory
From: Josh Boyer @ 2011-07-12 13:27 UTC (permalink / raw)
  To: Suzuki Poulose; +Cc: LinuxPPC-dev
In-Reply-To: <4E12ACD2.9050305@in.ibm.com>

On Tue, Jul 5, 2011 at 2:18 AM, Suzuki Poulose <suzuki@in.ibm.com> wrote:
> On 07/05/11 10:06, Tony Breeds wrote:
>>
>> From: Dave Kleikamp<shaggy@linux.vnet.ibm.com>
>>
>> The 44x code (which is shared by 47x) assumes the available physical
>> memory
>> begins at 0x00000000. =A0This is not necessarily the case in an AMP
>> environment.
>>
>> Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be
>> loaded into a higher memory range.
>
> I think the code assumes, the kernel is loaded in 256M aligned page. You =
may
> want to mention that in the description here.

Suzie, do you have any other concerns with this code in regards to
your kexec/kdump work for 4xx?  It seems fairly self-contained to me,
so I'd like to apply it but I want to make sure it is not going to
majorly conflict with the work you're doing.

josh

^ permalink raw reply

* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-12 14:20 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: Tabi Timur-B04825, Alexander Graf, linuxppc-dev@ozlabs.org,
	Gala Kumar-B11780
In-Reply-To: <20110711160646.291e977e@schlenkerla.am.freescale.net>



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Monday, July 11, 2011 4:07 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Tabi Timur-B04825; Grant Likely; Benjamin Herrensc=
hmidt; Gala Kumar-
> B11780; Alexander Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Mon, 11 Jul 2011 15:41:35 -0500
> Yoder Stuart-B08248 <B08248@freescale.com> wrote:
>=20
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Monday, July 11, 2011 1:05 PM
> > >
> > > Just because Linux does it that way now doesn't mean it needs to.
> > > The interrupt controller has a compatible property.  Match on it
> > > like any other device.  You can find which one is the root interrupt
> > > controller by looking for nodes with the interrupt-controller
> > > property that doesn't have an explicit interrupt-parent (or an interr=
upts property?  seems
> to be a conflict between ePAPR and the original interrupt mapping documen=
t).
> >
> > This may be the right long term thing to do, but restructuring how
> > Linux powerpc platforms work is a bigger effort.  I was looking for an
> > incremental improvement over what we do now, which is pass a
> > compatible of MPC8544DS and P4080DS for these virtual platforms.
>=20
> A hack is usually easier than doing it right. :-)
>=20
> Though often the effort required for the latter is overstated, and the "r=
ight long term thing"
> never makes the jump to "short term plan".
>=20
> There are a few things that need to be driven off the device tree that cu=
rrently aren't --
> using some mechanism other than the standard device model, if necessary (=
or as a first step) -
> - and then we need a does-nothing default platform as the match of last r=
esort.
>=20
> > However, they _are_ compatible with MPC8544DS and P4080DS so maybe
> > leaving the compatible string alone is ok for now.
>=20
> The virtual platforms are not compatible with MPC8544DS or P4080DS.  Only=
 a subset of what is
> on those boards is provided.  And in the case of direct device assignment=
, often the things
> that are present are incompatible (e.g.
> different type of eTSEC).

Hmm.  Perhaps what we need is a real binding that defines specifically
what those compatibles mean.   While not identical, a KVM
virtual machine is compatible in certain areas with those
boards.

The ePAPR defines the top level compatible as:

    Specifies a list of platform architectures with which this
    platform is compatible. This property can be used by
    operating systems in selecting platform specific code.

1275 doesn't mention compatible on the root from what I can
see.

Stuart

^ permalink raw reply

* Re: softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)
From: Eric Dumazet @ 2011-07-12 15:27 UTC (permalink / raw)
  To: Ronny Meeus
  Cc: linuxppc-dev, netdev, afleming, Thomas De Schampheleire,
	David Miller
In-Reply-To: <CAMJ=MEeC1hoqufs7AfFRn3yJoC8mdw7v+14N+7e=wQuJefm4_w@mail.gmail.com>

Le mardi 12 juillet 2011 à 14:03 +0200, Ronny Meeus a écrit :

> Sorry for not mentioning we were using a patched kernel.
> I was not aware that the code involved was patched by the FreeScale
> patches we applied. The code found in the stack dumps is not
> implemented in FSL specific files.
> 
> While reading the code of af_packet I saw that the spin_lock_bh is
> used in several places while this is not the case in the tpacket_rcv
> function. Since we had a locking issue in that code, I thought that my
> patch would be OK.
> I was not aware that for that specific function (tpacket_rcv) a
> different lock primitive must be used. A suggestion for improvement:
> it would be better to document this pre-condition in the code.
> 
> After doing the change you proposed our code now looks like:
> 
> >---if (dev->features & NETIF_F_HW_QDISC) {
> >--->---txq = dev_pick_tx(dev, skb);
> >--->---local_bh_disable();
> >--->---rc = dev_hard_start_xmit(skb, dev, txq);
> >--->---local_bh_enable();
> >--->---return rc;
> >---}
> 
> >---/* Disable soft irqs for various locks below. Also
> >--- * stops preemption for RCU.
> >--- */
> >---rcu_read_lock_bh();
> 
> but we still see the issue "BUG: sleeping function called from invalid context":

Of course you are if this is the only change you did.

> 
> [   91.015989] BUG: sleeping function called from invalid context at
> include/linux/skbuff.h:786
> [   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1842
> [   91.200461] Call Trace:
> [   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
> [   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
> [   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758


Please read again my mail : 

I said : "doing GFP_KERNEL allocations in dpa_tx() is wrong, for sure."

I dont have this code, but I suspect it's using : skb_copy(skb,
GFP_KERNEL)

Just say no, use GFP_ATOMIC instead.

Real question is : why skb_copy() is done, since its slow as hell.

> [   91.431957] [ec58bc80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
> [   91.504952] [ec58bcc0] [c02d7ab0] dev_queue_xmit+0x60/0x3ac
> [   91.571692] [ec58bcf0] [c0338d54] packet_sendmsg+0x8c4/0x988
> [   91.639457] [ec58bd70] [c02c3838] sock_sendmsg+0x90/0xb4
> [   91.703066] [ec58be40] [c02c4420] sys_sendto+0xdc/0x120
> [   91.765646] [ec58bf10] [c02c57d0] sys_socketcall+0x148/0x210
> [   91.833420] [ec58bf40] [c001084c] ret_from_syscall+0x0/0x3c
> [   91.900153] --- Exception: c01 at 0x4824df00
> [   91.900157]     LR = 0x4828a030
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox