LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Freescale 8272ADS configuration
From: Krishnan @ 2006-04-14 13:40 UTC (permalink / raw)
  To: Linuxppc-Embedded@Ozlabs. Org

Hi Carlos,

I found two interesting documents on the Web (one from a grad student and
other from Freescale website). Both are generic steps from U-BOOT porting to
Linux bring up. I don't know which kernel you decide to use but these are
quite useful to read to get very good inputs.

I have both these documents (they are for 860 and 8260 but as Wolfgang
mentioned) these are very closely related as they are the same family. I
cannot attach documents more than 100 KB on this list.

One more "WAR Story" for you too from helicontech

http://www.helicontech.co.il/whitepapers/LinuxBringUp.html

Cheers,
Krishnan

^ permalink raw reply

* bisect results for powerbook G4 not booting on latest git tree
From: Aristeu Sergio Rozanski Filho @ 2006-04-14 13:31 UTC (permalink / raw)
  To: linuxppc-dev

Hi,
	I was unable to find someone who already did it, so here it
	the possible culprit of latest git tree (pulled both Linus' and
	Paul's tree) not booting on a powerbook G4:

a0652fc9a28c3ef8cd59264bfcb089c44d1b0e06 is first bad commit
diff-tree a0652fc9a28c3ef8cd59264bfcb089c44d1b0e06 (from 55aab8cd3a498201b769a19de861c77516bdfd45)
Author: Paul Mackerras <paulus@samba.org>
Date:   Mon Mar 27 15:03:03 2006 +1100

    powerpc: Unify the 32 and 64 bit idle loops

    This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
    loops.  It brings over the concept of having a ppc_md.power_save
    function from 32-bit to ARCH=powerpc, which lets us get rid of
    native_idle().  With this we will also be able to simplify the idle
    handling for pSeries and cell.

    Signed-off-by: Paul Mackerras <paulus@samba.org>

:040000 040000 c6998a13a214b16c0cb936526d9e087095530eb39282cff1b1d6a5893c17319c0fbe13d93f5369f7 M      arch
:040000 040000 44489477a1708a022f564892e233207434ab79ee78d125804d5c52fef1ebc089bce121773bc8f5b7 M      include

if there's any other info that I can provide or a patch to try, please
let me know

-- 
Aristeu

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Muli Ben-Yehuda @ 2006-04-14 14:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <1144961564.4935.24.camel@localhost.localdomain>

On Fri, Apr 14, 2006 at 06:52:44AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2006-04-13 at 20:31 +0300, Muli Ben-Yehuda wrote:
> > On Thu, Apr 13, 2006 at 11:07:12AM -0500, Olof Johansson wrote:
> > 
> > > Walking the DT means we need to hardcode it on PCI IDs, since the Apple
> > > OF doesn't give the Airport device a logical name. It's probably easier
> > > to implement than walking PCI, but we'd need to maintain a table. My
> > > vote is for PCI walking, I'll give that a shot over the weekend.
> > 
> > Cool! bonus points if you do it in drivers/pci and we can steal it
> > easily for Calgary on x8-64 :-)
> 
> How so ? Anything remotely related to the iommu is totally different...
> Besides, on x86-64, laptops _are_ more common, and thus the problem of
> cardbus cards is much more significant.

What I had in mind is an interface that given a PCI bridge will tell
you what's the most restrictive DMA mask for a device on that bridge,
so that you'll know whether you need to enable the IOMMU for that
bridge. I'll even settle for a function that tells you what's the most
restrictive DMA mask in the system, preiod. There's nothing inherently
arch specific about this.

(and as a side note, the IOMMU we are working on on x86-64 is Calgary,
which is actually roughly the same chipset used in some PPC
machines...)

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

^ permalink raw reply

* Re: bisect results for powerbook G4 not booting on latest git tree
From: Aristeu Sergio Rozanski Filho @ 2006-04-14 16:14 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20060414133138.GG16901@cathedrallabs.org>

> 	I was unable to find someone who already did it, so here it
> 	the possible culprit of latest git tree (pulled both Linus' and
> 	Paul's tree) not booting on a powerbook G4:
hm, checking for linuxppc-dev archives too would be smart. sorry for the
noise

-- 
Aristeu

^ permalink raw reply

* Re: [patch][rfc]flattened device tree: Passing a dtb (blob) to Linux.
From: Michael Ellerman @ 2006-04-14 16:19 UTC (permalink / raw)
  To: Jimi Xenidis; +Cc: linuxppc-dev
In-Reply-To: <52F25F7B-ED3C-4A72-A8C9-3631D8D83357@watson.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

On Fri, 2006-04-14 at 08:45 -0400, Jimi Xenidis wrote:
> On Apr 13, 2006, at 7:12 PM, Benjamin Herrenschmidt wrote:
> >
> > We should make lmb_reserve() of redudant/overlapping entries become
> > harmless I think.
> 
> Hmm.. I think it would be worthy of a warning, no?

Definitely. We had weird bugs in kexec because regions were overlapping.
I just haven't got 'round to writing the patch .. maybe I should :)

> > We need to be backward compatible with earlier blobs
> > that do contain themselves in the reserve map
> 
> Do you think it is possible that the blob may have a single  
> reservation that includes the blob but is larger? if not then we  
> could simply do..

Well it's definitely possible, but I don't know if anyone actually does
it. But I guess we should try to be nice and not warn for old blobs.

This patch looks like a decent approach, although I'd want to add a
check so that if the blob is reserved _three_ times we do warn, it's
getting a bit silly isn't it :)

cheers

> --
> diff -r eb0990a251a9 arch/powerpc/kernel/prom.c
> --- a/arch/powerpc/kernel/prom.c	Thu Mar 30 22:05:40 2006 -0500
> +++ b/arch/powerpc/kernel/prom.c	Fri Apr 14 08:44:10 2006 -0400
> @@ -1129,9 +1129,17 @@ static void __init early_reserve_mem(voi
> {
> 	u64 base, size;
> 	u64 *reserve_map;
> +	unsigned long self_base;
> +	unsigned long self_size;
> 	reserve_map = (u64 *)(((unsigned long)initial_boot_params) +
> 					initial_boot_params->off_mem_rsvmap);
> +
> +	/* before we do anything, lets reserve the dt blob */
> +	self_base = __pa((unsigned long)initial_boot_params);
> +	self_size = initial_boot_params->totalsize;
> +	lmb_reserve(self_base, self_size);
> +
> #ifdef CONFIG_PPC32
> 	/*
> 	 * Handle the case where we might be booting from an old kexec
> @@ -1146,6 +1154,9 @@ static void __init early_reserve_mem(voi
> 			size_32 = *(reserve_map_32++);
> 			if (size_32 == 0)
> 				break;
> +			/* skip if the reservation is for the blob */
> +			if (base_32 == self_base && size_32 == self_size)
> +				continue;
> 			DBG("reserving: %x -> %x\n", base_32, size_32);
> 			lmb_reserve(base_32, size_32);
> 		}
> @@ -1157,6 +1168,9 @@ static void __init early_reserve_mem(voi
> 		size = *(reserve_map++);
> 		if (size == 0)
> 			break;
> +		/* skip if the reservation is for the blob */
> +		if (base == self_base && size == self_size)
> +			continue;
> 		DBG("reserving: %llx -> %llx\n", base, size);
> 		lmb_reserve(base, size);
> 	}
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: [PATCH] [1/2] POWERPC: IOMMU support for honoring dma_mask
From: Olof Johansson @ 2006-04-14 18:55 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev, plush, linux-kernel
In-Reply-To: <20060413020559.GC24769@pb15.lixom.net>

On Wed, Apr 12, 2006 at 09:05:59PM -0500, Olof Johansson wrote:

> Since time is somewhat of essense (if 2.6.17 is still an option), I went
> for the choice of sending now and follow up with a small bugfix later
> in case something shows up, especially since quick regression tests seem
> ok.

FYI, I now have a positive test report from a bcm43xx user with a 1.5GB G5.
(Thanks for testing this, Marc!).


-Olof

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Paul Mackerras @ 2006-04-14 19:07 UTC (permalink / raw)
  To: Becky Bruce; +Cc: Michael Schmitz, debian-powerpc, linuxppc-dev list
In-Reply-To: <21F7D7D8-B9BC-44EB-B07B-F888D89DCF25@freescale.com>

Becky Bruce writes:

> Actually, I think the problem is that the code linux is using to turn  
> on nap mode is not guaranteed to put the processor in nap mode by the  
> time the blr in ppc6xx_idle occurs.

Thanks, Becky.

This patch fixes it for me.  Comments, anyone?

Paul.

diff -urN powerpc-merge/arch/powerpc/kernel/idle_6xx.S pmac-2.6.17-rc1/arch/powerpc/kernel/idle_6xx.S
--- powerpc-merge/arch/powerpc/kernel/idle_6xx.S	2006-04-04 23:09:16.000000000 -0700
+++ pmac-2.6.17-rc1/arch/powerpc/kernel/idle_6xx.S	2006-04-14 10:29:54.000000000 -0700
@@ -151,41 +151,47 @@
 	isync
 	mtmsr	r7
 	isync
-	sync
-	blr
+1:	b	1b
 	
 /*
  * Return from NAP/DOZE mode, restore some CPU specific registers,
  * we are called with DR/IR still off and r2 containing physical
- * address of current.
+ * address of current.  R11 and CR contain HID0.  We have to preserve
+ * r10 and r12.
  */
 _GLOBAL(power_save_6xx_restore)
+	tophys(r11, r1)		/* Make the idle task do a blr */
+	lwz	r9,_LINK(r11)
+	stw	r9,_NIP(r11)
 	mfspr	r11,SPRN_HID0
-	rlwinm.	r11,r11,0,10,8	/* Clear NAP & copy NAP bit !state to cr1 EQ */
-	cror	4*cr1+eq,4*cr0+eq,4*cr0+eq
+	rlwinm	r11,r11,0,10,8	/* Clear NAP */
 BEGIN_FTR_SECTION
 	rlwinm	r11,r11,0,9,7	/* Clear DOZE */
 END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
 	mtspr	SPRN_HID0, r11
 
 #ifdef DEBUG
-	beq	cr1,1f
+	bf	9,1f
 	lis	r11,(nap_return_count-KERNELBASE)@ha
 	lwz	r9,nap_return_count@l(r11)
 	addi	r9,r9,1
 	stw	r9,nap_return_count@l(r11)
 1:
 #endif
-	
+
+#ifdef CONFIG_SMP
 	rlwinm	r9,r1,0,0,18
 	tophys(r9,r9)
 	lwz	r11,TI_CPU(r9)
 	slwi	r11,r11,2
+#else
+	li	r11,0
+#endif
 	/* Todo make sure all these are in the same page
-	 * and load r22 (@ha part + CPU offset) only once
+	 * and load r11 (@ha part + CPU offset) only once
 	 */
 BEGIN_FTR_SECTION
-	beq	cr1,1f
+	bf	9,1f
 	addis	r9,r11,(nap_save_msscr0-KERNELBASE)@ha
 	lwz	r9,nap_save_msscr0@l(r9)
 	mtspr	SPRN_MSSCR0, r9

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Olof Johansson @ 2006-04-14 19:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Becky Bruce, Michael Schmitz, debian-powerpc, linuxppc-dev list
In-Reply-To: <17471.62187.774127.783000@cargo.ozlabs.ibm.com>

Hi,

On Fri, Apr 14, 2006 at 12:07:23PM -0700, Paul Mackerras wrote:
> Becky Bruce writes:
> 
> > Actually, I think the problem is that the code linux is using to turn  
> > on nap mode is not guaranteed to put the processor in nap mode by the  
> > time the blr in ppc6xx_idle occurs.
> 
> Thanks, Becky.
> 
> This patch fixes it for me.  Comments, anyone?

The bf mnemonics had me scratching my head a while, it's not listed as
a simplified mnemonic in the 64-bit PEM. Two questions below.

>  _GLOBAL(power_save_6xx_restore)
> +	tophys(r11, r1)		/* Make the idle task do a blr */
> +	lwz	r9,_LINK(r11)
> +	stw	r9,_NIP(r11)
>  	mfspr	r11,SPRN_HID0
> -	rlwinm.	r11,r11,0,10,8	/* Clear NAP & copy NAP bit !state to cr1 EQ */
> -	cror	4*cr1+eq,4*cr0+eq,4*cr0+eq
> +	rlwinm	r11,r11,0,10,8	/* Clear NAP */
>  BEGIN_FTR_SECTION
>  	rlwinm	r11,r11,0,9,7	/* Clear DOZE */
>  END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
>  	mtspr	SPRN_HID0, r11
>  
>  #ifdef DEBUG
> -	beq	cr1,1f
> +	bf	9,1f

Where is cr0 set now -- you took the dot off of rlwinm?

>  	lis	r11,(nap_return_count-KERNELBASE)@ha
>  	lwz	r9,nap_return_count@l(r11)
>  	addi	r9,r9,1
>  	stw	r9,nap_return_count@l(r11)
>  1:
>  #endif
> -	
> +
> +#ifdef CONFIG_SMP
>  	rlwinm	r9,r1,0,0,18
>  	tophys(r9,r9)
>  	lwz	r11,TI_CPU(r9)
>  	slwi	r11,r11,2
> +#else
> +	li	r11,0
> +#endif
>  	/* Todo make sure all these are in the same page
> -	 * and load r22 (@ha part + CPU offset) only once
> +	 * and load r11 (@ha part + CPU offset) only once
>  	 */
>  BEGIN_FTR_SECTION
> -	beq	cr1,1f
> +	bf	9,1f

Same comment as above w.r.t. cr0?



-Olof

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Becky Bruce @ 2006-04-14 20:00 UTC (permalink / raw)
  To: Olof Johansson
  Cc: linuxppc-dev list, Michael Schmitz, debian-powerpc,
	Paul Mackerras
In-Reply-To: <20060414195436.GC24769@pb15.lixom.net>

He's being sneaky - there's a copy of HID0 in the CR at this point  
from the caller, and bit 9 is the position for NAP.

-B

On Apr 14, 2006, at 2:54 PM, Olof Johansson wrote:

> Hi,
>
> On Fri, Apr 14, 2006 at 12:07:23PM -0700, Paul Mackerras wrote:
>> Becky Bruce writes:
>>
>>> Actually, I think the problem is that the code linux is using to  
>>> turn
>>> on nap mode is not guaranteed to put the processor in nap mode by  
>>> the
>>> time the blr in ppc6xx_idle occurs.
>>
>> Thanks, Becky.
>>
>> This patch fixes it for me.  Comments, anyone?
>
> The bf mnemonics had me scratching my head a while, it's not listed as
> a simplified mnemonic in the 64-bit PEM. Two questions below.
>
>>  _GLOBAL(power_save_6xx_restore)
>> +	tophys(r11, r1)		/* Make the idle task do a blr */
>> +	lwz	r9,_LINK(r11)
>> +	stw	r9,_NIP(r11)
>>  	mfspr	r11,SPRN_HID0
>> -	rlwinm.	r11,r11,0,10,8	/* Clear NAP & copy NAP bit !state to cr1  
>> EQ */
>> -	cror	4*cr1+eq,4*cr0+eq,4*cr0+eq
>> +	rlwinm	r11,r11,0,10,8	/* Clear NAP */
>>  BEGIN_FTR_SECTION
>>  	rlwinm	r11,r11,0,9,7	/* Clear DOZE */
>>  END_FTR_SECTION_IFSET(CPU_FTR_CAN_DOZE)
>>  	mtspr	SPRN_HID0, r11
>>
>>  #ifdef DEBUG
>> -	beq	cr1,1f
>> +	bf	9,1f
>
> Where is cr0 set now -- you took the dot off of rlwinm?
>
>>  	lis	r11,(nap_return_count-KERNELBASE)@ha
>>  	lwz	r9,nap_return_count@l(r11)
>>  	addi	r9,r9,1
>>  	stw	r9,nap_return_count@l(r11)
>>  1:
>>  #endif
>> -	
>> +
>> +#ifdef CONFIG_SMP
>>  	rlwinm	r9,r1,0,0,18
>>  	tophys(r9,r9)
>>  	lwz	r11,TI_CPU(r9)
>>  	slwi	r11,r11,2
>> +#else
>> +	li	r11,0
>> +#endif
>>  	/* Todo make sure all these are in the same page
>> -	 * and load r22 (@ha part + CPU offset) only once
>> +	 * and load r11 (@ha part + CPU offset) only once
>>  	 */
>>  BEGIN_FTR_SECTION
>> -	beq	cr1,1f
>> +	bf	9,1f
>
> Same comment as above w.r.t. cr0?
>
>
>
> -Olof

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Paul Mackerras @ 2006-04-14 20:19 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Becky Bruce, Michael Schmitz, debian-powerpc, linuxppc-dev list
In-Reply-To: <20060414195436.GC24769@pb15.lixom.net>

Olof Johansson writes:

> Where is cr0 set now -- you took the dot off of rlwinm?

transfer_to_handler does mfspr r11,SPRN_HID0; mtcr r11 before jumping
to power_save_6xx_restore.  The rlwinm. was wrong anyway since it was
setting cr0.eq based on all the *other* bits in HID0, not HID0_NAP
(doh!).

Paul.

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Olof Johansson @ 2006-04-14 20:24 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Becky Bruce, Michael Schmitz, debian-powerpc, linuxppc-dev list
In-Reply-To: <17472.984.760483.334447@cargo.ozlabs.ibm.com>

On Fri, Apr 14, 2006 at 01:19:36PM -0700, Paul Mackerras wrote:
> Olof Johansson writes:
> 
> > Where is cr0 set now -- you took the dot off of rlwinm?
> 
> transfer_to_handler does mfspr r11,SPRN_HID0; mtcr r11 before jumping
> to power_save_6xx_restore.  The rlwinm. was wrong anyway since it was
> setting cr0.eq based on all the *other* bits in HID0, not HID0_NAP
> (doh!).

Oh, right, you even updated the comment to reflect this. My bad.
Patch looks good to me then.


-Olof

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Luck, Tony @ 2006-04-14 20:53 UTC (permalink / raw)
  To: Mel Gorman; +Cc: davej, linuxppc-dev, ak, bob.picco, linux-kernel, linux-mm
In-Reply-To: <20060414131235.GA19064@skynet.ie>

On Fri, Apr 14, 2006 at 02:12:35PM +0100, Mel Gorman wrote:
> That appears fine, but I call add_active_range() after a GRANULEROUNDUP and
> GRANULEROUNDDOWN has taken place so that might be the problem, especially as
> all those ranges are aligned on a 16MiB boundary. The following patch calls
> add_active_range() before the rounding takes place. Can you try it out please?

That's good.  Now I see identical output before/after your patch for
the generic (DISCONTIG=y) kernel:

On node 0 totalpages: 259873
  DMA zone: 128931 pages, LIFO batch:7
  Normal zone: 130942 pages, LIFO batch:7

-Tony

^ permalink raw reply

* Re: [PATCH] [2/2] POWERPC: Lower threshold for DART enablement to 1GB, V2
From: Benjamin Herrenschmidt @ 2006-04-14 20:57 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: Olof Johansson, linuxppc-dev, paulus, linux-kernel
In-Reply-To: <20060414144830.GQ10412@granada.merseine.nu>


> What I had in mind is an interface that given a PCI bridge will tell
> you what's the most restrictive DMA mask for a device on that bridge,
> so that you'll know whether you need to enable the IOMMU for that
> bridge. I'll even settle for a function that tells you what's the most
> restrictive DMA mask in the system, preiod. There's nothing inherently
> arch specific about this.
>
> (and as a side note, the IOMMU we are working on on x86-64 is Calgary,
> which is actually roughly the same chipset used in some PPC
> machines...)

Not sure I ever heard about that... What chipsets ?

Ben.

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Benjamin Herrenschmidt @ 2006-04-14 21:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Becky Bruce, Michael Schmitz, debian-powerpc, linuxppc-dev list
In-Reply-To: <17471.62187.774127.783000@cargo.ozlabs.ibm.com>

On Fri, 2006-04-14 at 12:07 -0700, Paul Mackerras wrote:
> Becky Bruce writes:
> 
> > Actually, I think the problem is that the code linux is using to turn  
> > on nap mode is not guaranteed to put the processor in nap mode by the  
> > time the blr in ppc6xx_idle occurs.
> 
> Thanks, Becky.
> 
> This patch fixes it for me.  Comments, anyone?

Looks good to me except that we need the same for ppc64 since the 970
theorically has the same problem...

Ben.

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Becky Bruce @ 2006-04-14 21:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev list, debian-powerpc
In-Reply-To: <20060414202452.GD24769@pb15.lixom.net>

On Fri, 2006-04-14 at 12:07 -0700, Paul Mackerras wrote:

> Becky Bruce writes:
>
>
>> Actually, I think the problem is that the code linux is using to turn
>> on nap mode is not guaranteed to put the processor in nap mode by the
>> time the blr in ppc6xx_idle occurs.
>>
>
> Thanks, Becky.
>
> This patch fixes it for me.  Comments, anyone?

Patch LGTM, as well.  I like the approach.

Thanks!
-Becky

^ permalink raw reply

* [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-14 21:18 UTC (permalink / raw)
  To: LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, tony.luck, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux

The current method of allocating space for per_cpu variables in modules
is not robust and consumes quite a bit of space.

per_cpu variables:

The per_cpu variables are declared by code that needs to have variables
spaced out by cache lines on SMP machines, such that, writing to any of
these variables on one CPU wont be in danger of writing into a cache
line of a global variable shared by other CPUs.  If this were to happen,
the performance would go down by having the CPUs unnecessarily needing
to update cache lines across CPUs for even read only global variables.

To solve this, a developer needs only to declare a per_cpu variable
using the DECLARE_PER_CPU(type, var) macro.  This would then place the
variable into the .data.percpu section.  On boot up, an area is
allocated by the size of this section + PERCPU_ENOUGH_ROOM (mentioned
later) times NR_CPUS.  Then the .data.percpu section is copied into this
area once for NR_CPUS.  The .data.percpu section is later discarded (the
variables now exist in the allocated area).

The __per_cpu_offset[] array holds the difference between
the .data.percpu section and the location where the data is actually
stored. __per_cpu_offset[0] holds the difference for the variables
assigned to cpu 0, __per_cpu_offset[1] holds the difference for the
variables to cpu 1, and so on.

To access a per_cpu variable, the per_cpu(var, cpu) macro is used.  This
macro returns the address of the variable (still pointing to the
discarded .data.percpu section) plus the __per_cpu_offset[cpu]. So the
result is the location to the actual variable for the specified CPU
located in the allocated area.

Modules:

Since there is no way to know from per_cpu if the variable was part of a
module, or part of the kernel, the variables for the module need to be
located in the same allocated area as the per_cpu variables created in
the kernel.

Why is that?

The per_cpu variables are used in the kernel basically like normal
variables.  For example:

with:
  DEFINE_PER_CPU(int, myint);

we can do the following:
  per_cpu(myint, cpu) = 4;
  int i = per_cpu(myint, cpu);
  int *i = &per_cpu(myint, cpu);

Not to mention that we can export these variables as well so that a
module can be using a per_cpu variable from the kernel, or even declared
in another module and exported (the net code does this).

Now remember, the variables are still located in the discarded sections,
but their content is in allocated space offset per cpu.  We have a
single array storing these offsets (__per_cpu_offset).  So this makes it
very difficult to define special DEFINE/DECLARE_PER_CPU macros and use
the CONFIG_MODULE to play magic in figuring things out.  Mainly because
we have one per_cpu macro that can be used in a module referencing
per_cpu variables declared in the kernel, declared in the given module,
or even declared in another module.

PERCPU_ENOUGH_ROOM:

When you configure an SMP kernel with loadable modules, the kernel needs
to take an aggressive stance and preallocate enough room to hold the
per_cpu variables in all the modules that could be loaded.  To make
matters worst, this space is allocated per cpu!  So if you have a 64
processor machine with loadable modules, you are allocating extra space
for each of the 64 CPUs even if you never load a module that has a
per_cpu variable in it!

Currently PERCPU_ENOUGH_ROOM is defined as 32768 (32K).  On my 2x intel
SMP machine, with my normal configuration, using 2.6.17-rc1, the size
of .data.percpu is 17892 (17K).  So the extra space for the modules is
32768 - 17892 = 14876 (14K).  Now this is needed for every CPU so I am
actually using 
14876 * 2 = 29752 (or 29K).

Now looking at the modules that I have loaded, none of them had
a .data.percpu section defined, so that 29K was a complete waste!


So the current solution has two flaws:
1. not robust. If we someday add more modules that together take up
   more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
2. waste of memory.  We have 14K of memory wasted per CPU. Remember
   a 64 processor machine would be wasting 896K of memory!


A solution:

I spent some time trying to come up with a solution to all this.
Something that wouldn't be too intrusive to the way things already work.
I received nice input from Andi Kleen and Thomas Gleixner.  I first
tried to use the __builtin_choose_expr and __builtin_types_compatible_p
to determine if a variable is from the kernel or modules at compile
time. But unfortunately, I've been told that makes things too complex,
but even worst it had "show stopping" flaws.

Ideally this could be resolved at link time of the module, but that too
would require looking into the relocation tables which are different for
every architecture.  This would be too intrusive, and prone to bugs.

So I went for a much simpler solution.  This solution is not optimal in
saving space, but it does much better than what is currently
implemented, and is still easy to understand and manage, which alone may
outweigh an optimal space solution.

First off, if CONFIG_SMP or CONFIG_MODULES is not set, the solution is
the same as it currently is.  So my solution only affects the kernel if
both CONFIG_SMP and CONFIG_MODULES are set (this is the same
configuration that wastes the memory in the current implementation).

I created a new section called, .data.percpu_offset.  This section will
hold a pointer for every variable that is declared as per_cpu with
DEFINE_PER_CPU.  Although this wastes space too, the amount of space
needed for my setup (the same configuration that wastes 14K per cpu) is
4368 (4K).  Since this section is not copied for every CPU, this saves
us 10K for the first cpu (14 - 4) and 14K for every CPU after that! So
this saves on my setup 24K. (Note: I noticed that I used the default
NR_CPUS which is 8, so this really saved me 108K).

The data in .data.percpu_offset holds is referenced by the per_cpu
variable name which points to the __per_cpu_offset array.  For modules,
it will point to the per_cpu_offset array of the module.

Example:

 DEFINE_PER_CPU(int, myint);

 would now create a variable called per_cpu_offset__myint in
the .data.percpu_offset section.  This variable will point to the (if
defined in the kernel) __per_cpu_offset[] array.  If this was a module
variable, it would point to the module per_cpu_offset[] array which is
created when the modules is loaded.

So now I get rid of the PERCPU_ENOUGH_ROOM constant and some of the
complexity in kernel/module.c that shares code with the kernel, and each
module has it's own allocation of per_cpu data. And this means the
per_cpu data is more robust (can handle future changes in the modules)
and saves up space.


Draw backs:

The one draw back I have on this, is because the DECLARE_PER_CPU macro
declares two variables now, you can't declare a "static DEFINE_PER_CPU".
So instead I created a DEFINE_STATIC_PER_CPU macro to handle this case.

The following patch set is against 2.6.17-rc1, but this patch set is
currently only for i386.  I have a x86_64 that I can work on to port,
but I will need the help of others to port to some other archs, mostly
the other 64 bit archs.  I tried to CC the maintainers of the other
archs (those listed in the vmlinux.lds, include/asm-<arch>/percpu.h
files and the MAINTAINER file).

I'm not going to spam the CC list (nor Andrew) with the rest of the
patches (only 5).  Please see LKML for the rest.

-- Steve

^ permalink raw reply

* BDI-2000 Config file for MPC8349 eval board
From: Ben Warren @ 2006-04-14 22:03 UTC (permalink / raw)
  To: Linuxppc-embedded

Hello,

Does anybody have a solid config file for the Freescale MPC8349EMDS eval
board?  I guess the MPC8349ADS would be fine too, since I've been told
they're the same thing.  I've been tweaking the file named
'mcp8349e.cfg' that shipped with the BDI, but it's a bit flaky with my
board.  In particular, sometimes it can't write to the Flash programming
workspace, maybe indicating that the DDR isn't properly set up, but
there have been other things too that are slowly eating at me.

thanks,
Ben

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Andrew Morton @ 2006-04-14 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Example:
> 
>  DEFINE_PER_CPU(int, myint);
> 
>  would now create a variable called per_cpu_offset__myint in
> the .data.percpu_offset section.

Suppose two .c files each have

	DEFINE_STATIC_PER_CPU(myint)

Do we end up with two per_cpu_offset__myint's in the same section?

^ permalink raw reply

* Re: [PATCH 00/05] robust per_cpu allocation for modules
From: Steven Rostedt @ 2006-04-14 22:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mips, davidm, linux-ia64, mj, spyro, joe, ak, linuxppc-dev,
	paulus, benedict.gaster, bjornw, mingo, grundler, starvik,
	torvalds, tglx, rth, chris, tony.luck, linux-kernel, ralf, marc,
	lethal, schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <20060414150625.3ba369d2.akpm@osdl.org>



On Fri, 14 Apr 2006, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Example:
> >
> >  DEFINE_PER_CPU(int, myint);
> >
> >  would now create a variable called per_cpu_offset__myint in
> > the .data.percpu_offset section.
>
> Suppose two .c files each have
>
> 	DEFINE_STATIC_PER_CPU(myint)
>
> Do we end up with two per_cpu_offset__myint's in the same section?
>

Both variables are defined as static:

ie.
  #define DEFINE_STATIC_PER_CPU(type, name) \
    static __attribute__((__section__(".data.percpu_offset"))) unsigned long *per_cpu_offset__##name; \
    static __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name

So the per_cpu_offset__myint is also static, and gcc should treat it
properly.  Although, yes there are probably going to be two variables
named per_cpu_offset__myint in the same section, but the scope of those
should only be visible by who sees the static.

Works like any other variable that's static, and even the current way
DEFINE_PER_CPU works with statics.

Thanks,

-- Steve

^ permalink raw reply

* RE: [PATCH 00/05] robust per_cpu allocation for modules
From: Chen, Kenneth W @ 2006-04-14 22:12 UTC (permalink / raw)
  To: 'Steven Rostedt', LKML, Andrew Morton
  Cc: linux-mips, David Mosberger-Tang, linux-ia64, Martin Mares, spyro,
	Joe Taylor, linuxppc-dev, paulus, benedict.gaster, bjornw,
	Ingo Molnar, grundler, starvik, Linus Torvalds, Thomas Gleixner,
	rth, chris, Luck, Tony, Andi Kleen, ralf, Marc Gauthier, lethal,
	schwidefsky, linux390, davem, parisc-linux
In-Reply-To: <1145049535.1336.128.camel@localhost.localdomain>

Steven Rostedt wrote on Friday, April 14, 2006 2:19 PM
> So the current solution has two flaws:
> 1. not robust. If we someday add more modules that together take up
>    more than 14K, we need to manually update the PERCPU_ENOUGH_ROOM.
> 2. waste of memory.  We have 14K of memory wasted per CPU. Remember
>    a 64 processor machine would be wasting 896K of memory!

If someone who has the money to own a 64-process machine, 896K of memory
is pocket change ;-)

- Ken

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 22:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: davej, linuxppc-dev, ak, bob.picco, Linux Kernel Mailing List,
	Linux Memory Management List
In-Reply-To: <20060414205345.GA1258@agluck-lia64.sc.intel.com>

On Fri, 14 Apr 2006, Luck, Tony wrote:

> On Fri, Apr 14, 2006 at 02:12:35PM +0100, Mel Gorman wrote:
>> That appears fine, but I call add_active_range() after a GRANULEROUNDUP and
>> GRANULEROUNDDOWN has taken place so that might be the problem, especially as
>> all those ranges are aligned on a 16MiB boundary. The following patch calls
>> add_active_range() before the rounding takes place. Can you try it out please?
>
> That's good.  Now I see identical output before/after your patch for
> the generic (DISCONTIG=y) kernel:
>
> On node 0 totalpages: 259873
>  DMA zone: 128931 pages, LIFO batch:7
>  Normal zone: 130942 pages, LIFO batch:7
>

Very very cool. Thanks for all the testing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: 7447A strange problem with MSR:POW (WAS: can't boot 2.6.17-rc1)
From: Benjamin Herrenschmidt @ 2006-04-14 22:57 UTC (permalink / raw)
  To: Becky Bruce
  Cc: Olof Johansson, linuxppc-dev list, Michael Schmitz,
	debian-powerpc, Paul Mackerras
In-Reply-To: <BBE6C9EA-53B5-4EAA-A766-DAF241E7040D@freescale.com>

On Fri, 2006-04-14 at 15:00 -0500, Becky Bruce wrote:
> He's being sneaky - there's a copy of HID0 in the CR at this point  
> from the caller, and bit 9 is the position for NAP.

It's a trick I learned from Darwin :) They do that regulary when code is
very cpu-feature dependant, like cache code for example, they put the
cpu features bitmask in CR and do branches based on individual bits of
it here or there.

Ben.

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
From: Mel Gorman @ 2006-04-14 23:50 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: davej, tony.luck, Linux Memory Management List, ak, bob.picco,
	Linux Kernel Mailing List, linuxppc-dev
In-Reply-To: <200604150917.10596.ncunningham@cyclades.com>

On Sat, 15 Apr 2006, Nigel Cunningham wrote:

> It looks to me like this code could be used by the software suspend code in
> our determinations of what pages to save

Potentially yes. Currently, the node map and related functions are marked 
__init so they become unavailable but that is not set in stone.

>, particularly in the context of
> memory hotplug support.

Right now during memory hot-add, the memory is not registered with 
add_active_range(), but it would be straight-forward to add the call to 
add_memory() of each architecture that supported hotplug for example.

> Just some food for thought at the moment; I'll see if
> I can come up with a patch when I have some time, but it might help justify
> getting this merged.
>

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Port Linux w/ mbxboot to PPCBoot system
From: Jessica Chen @ 2006-04-14 23:56 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,



     I am new to embedded system, I am studying ppcboot-1.1.5 and linux
kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in
the future.  In the build process, they use the zImage.initrd
(arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image
(arch/ppc/coffboot/vmlinux.gz) + separate initrd which is the way README
file suggested.



My question is:

since ppcboot is already running, what happens when I boot the kernel that
has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
overwritten?  If not, why?



     I am very tempted to follow the README to re-build the kernel with only
vmlinux.gz and port it, but I don't want to create any un-recoverable
results.  So I am here to seek advice, maybe this is something obvious to
many people.



Thanks in advance,





Jessica Chen

^ permalink raw reply

* Re: Port Linux w/ mbxboot to PPCBoot system
From: Wolfgang Denk @ 2006-04-15  0:12 UTC (permalink / raw)
  To: Jessica Chen; +Cc: linuxppc-embedded
In-Reply-To: <002701c6601f$15b27f30$9afea8c0@tcdomain.com>

Dear Jessica,

in message <002701c6601f$15b27f30$9afea8c0@tcdomain.com> you wrote:
> 
>      I am new to embedded system, I am studying ppcboot-1.1.5 and linux
> kernel-2.4.4 that comes with an mpc852 base board, we want to modify it in

Both PPCBoot and Linux 2.4.4 are *hoplessly* obsolete. It may  be  ok
to study this to understand the workings, but please don't even dream
of using it for any current work.

> the future.  In the build process, they use the zImage.initrd
> (arch/ppc/mbxboot/zvmlinux.initrd) instead of the raw Linux kernel image

Somebody didn't know what he was doing, it seems.

> since ppcboot is already running, what happens when I boot the kernel that
> has old boot loader code in arch/ppc/mbxboot?  Will some parameters be
> overwritten?  If not, why?

The Linux bootstrap loader code (arch/ppc/mbxboot)  will  ignore  the
parameteres  passed  by  U-Boot,  will set up is own (hardwired), and
duplicate some of the things that PPCboot did or would do.

>      I am very tempted to follow the README to re-build the kernel with only
> vmlinux.gz and port it, but I don't want to create any un-recoverable
> results.  So I am here to seek advice, maybe this is something obvious to
> many people.

Don't change anything. Look at it, then drop it. Start using  current
code, i. e. a recent version of U-Boot and a recent Linux kernel.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Good morning. This is the telephone company. Due  to  repairs,  we're
giving  you  advance notice that your service will be cut off indefi-
nitely at ten o'clock. That's two minutes from now.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox