LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Intercept System call using Kernel  module is 2.6 kernel
From: Meswani, Mitesh @ 2006-06-06 16:25 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <C26C730943E01145B4F89E37FE0A022002BBC7A2@itdsrvmail02.utep.edu>

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]

 
 
Hello 
 
 
I am attempting to run some user code with kernel space permission. I am using the ppc64 kernel version 2.6.16-rc4-3-ppc64 for IBM Power5 processors. 
In this kernel module I am trying to implement a function that can be called from user space. 
 
I have found through various posts that using unused system calls and replacing them temporarily can acheive this objective. 
 
This is what I am doing, but its not working, please bear with the slightly long code that follows: 
 
1) since the 2.6 kernel does not export sys_call_table, I grep it from the boot image
 
2) Next I write the kernel module as : 
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/syscalls.h>
unsigned long **sctable;
void *org_func;  /***** Copy of the original calls address ********/

asmlinkage int mitesh_func(void)   
{ 
        printk(KERN_ALERT "Executing mitesh_func...\n"); 
        return 2;
} 

int init_module(void)
{
 unsigned long ptr;
 unsigned long *p;
 ptr = 0x23203404;  /*** some hard coded addresses from grepping for sys_call_table *****/
  p = (unsigned long *)ptr;
  sctable = (unsigned long **)p;
  printk("The address of the system call table is: 0x%x\n",&sctable[0]);
  printk("The address of syscall #137 is: 0x%x\n",sctable[137]);

org_func = (void *) (sctable[137]);  /**** Store the original sys call ****/
 printk("Original func address 0x%x stored \n",org_func);
 sctable[137] = (void *) mitesh_func;  /**** replace with mitesh_func ****/
printk("The new sys call address is 0x%x and stored as : 0x%x\n",mitesh_func, sctable[137]);

  return 0; 
}
void cleanup_module(void)

{
        sctable[137] = (void *) org_func; 
        printk("Upon module unload the sctable #137 address is :0x%x\n",sctable[137]);
        printk("Module is unloaded!\n");
}

3) My user app looks like this:
#include <stdio.h> 
#include <errno.h> 
#include <asm-ppc64/unistd.h> 
#define __NR_mitesh_func 137 
 
_syscall0(int, mitesh_func); 
void main() 
{
        int x=0; 
        x=mitesh_func(); 
        printf("mitesh_func returned %d\n",x);
}  

 
4) I verify from the system logs that when I insmod the kernel module I get all the print statements. I verified from the logs  that the address of the sys_call_table is correctly passed and from /proc/kallsysms I can see that my function mitesh_func has been defined and has the address as indicated in the logs. 
 
The problem is that when I execute my user app I expect to see two things: 
 a) I should see a message in the log "Executing mitesh_func..." and 
 b) A return value of 2 
 
However I get an error value -1 returned. 
 
Any help and ideas are highly appreciated.  
 
Thank you in advance, 
Mitesh 
 

[-- Attachment #2: Type: text/html, Size: 5523 bytes --]

^ permalink raw reply

* Base address of executables - weirdness?
From: H. Peter Anvin @ 2006-06-06 15:42 UTC (permalink / raw)
  To: linuxppc-dev

I'm trying to track down an odd issue with klibc on ppc32.

Until recently, binaries linked with ld defaulted to a base address of 
0x10000000+SIZEOF_HEADERS.  However, recently I've gotten a couple of 
reports -- and I've been able to confirm this on my FC5 system -- that 
some versions of ld links at 0x01800000+SIZEOF_HEADERS.  Needless to 
say, this is more than a bit confusing, *especially* since "ld -verbose" 
still reports:

     PROVIDE (__executable_start = 0x10000000); . = 0x10000000 + 
SIZEOF_HEADERS;

... at the top of the linker script.

I'm rather baffled.  Has anyone else seen this, and/or have any other 
explanation?

	-hpa

^ permalink raw reply

* PowerPC 8266 ADS-PCI(CPM8260) SCC (UART/Ethernet Mode) and SMC in UART mode
From: Aung Soe @ 2006-06-06 14:50 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

Dear All

I am working on PowerPC 8266 ADS-PCI board

 with Montavista Linux with 2.4.20 Kernel to make SCC1, SMC 1& 2

in UART mode, SCC2, 3 & 4 in Ethernet mode, and FCC 2 in Fast

Ethernet Mode; all in NMSI and not TDM. The drivers provided with

2.4.20 Linux on Power PC were refered and reused.

http://lxr.linux.no/source/arch/ppc/8260_io/?v=2.4.20;a=ppc

 When there are only 3 SCCs used in Ethernet mode and the last

SCC in UART mode then the system works well.

But when 2 more SMC are added to the UART serial driver, the SCC

Ethernet network driver is not working after all ports working for a while.

I will be appreciate to hear any pointers and hints.

Thanks in advance.

Sincerley

Aung

[-- Attachment #2: Type: text/html, Size: 1292 bytes --]

^ permalink raw reply

* Re: eth0: tx queue full
From: Steve Iribarne (GMail) @ 2006-06-06 14:45 UTC (permalink / raw)
  To: salvatore cusenza; +Cc: linuxppc-embedded
In-Reply-To: <9252a64b0606060113v696adbb7ib43ad95836c0724b@mail.gmail.com>

On 6/6/06, salvatore cusenza <salvatore.cusenza@gmail.com> wrote:
>
> At runtime during the usual life of my board (MPC852 and linux-2.4.20 Denk's
> distribution)
>  I have experienced the following crash:
>
>
> eth0: tx queue full!.
> eth0: tx queue full!.
> eth0: tx queue full!.
>
> Oops: kernel access of bad area, sig: 11
> NIP: C000D440 XER: 00000000 LR: C00BB040 SP: C0C9BC10 REGS: c0c9bb60 TRAP:
> 0300    Tainted: P
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 00001F9D, DSISR: 000000E4
> TASK = c0c9a000[145] 'L5421' Last syscall: 4
> last math 00000000 last altivec 00000000
> GPR00: 00000000 C0C9BC10 C0C9A000 C0F56D70 00001F99 0000003C C0F56D6C
> 00000007
> GPR08: 00000001 0000003C 00000000 C0F56DB0 C0D83C3C 10071D28 00000000
> C3120000
> GPR16: C311CB04 C311C8D8 C311C754 C0170000 C3120000 C311CB30 00000001
> C0169DA0
> GPR24: F0000E00 00001F9D C0F58400 0000003C 00000040 C2080100 C0F58200
> C0F501B0
> Call backtrace:
> C00BAF8C C00BABC8 C0005848 C3119448 C31194C0 C31194F8 C00066F8
> C0011A48 C00BA8FC C00CADD8 C00C3F00 C0016B50 C00AFE4C C00B4B94
> C00B5EF4 C003571C C000457C 0FFD5E4C 0FEDB8DC 0FEDB284 1003B5B8
> 1003D558 1003A88C 0FED34A4 0FED32D0 0FFCFEE4 0FD5F590
> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>  <0>Rebooting in 180 seconds..
>
>
> Could you suggest me something to investigate?
>


First thing I'd do is get my hands on ksymoops and look at the callstack.

Check out...

http://www.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/

>From my past experience, seems to me that I have had a similar oops
when the driver code that I have been debugging usually is stuck in a
loop at interrupt time and your missing interrupts.

Or better yet, your just plain crashing at interrupt time.  It sounds
like this is pretty easy to reproduce so follow the readme on
ksymoops.

Hope that helps.

-stv

^ permalink raw reply

* RE: [PATCH/2.6.17-rc4 4/10]Powerpc:  Add tsi108 pic support
From: Alexandre Bounine @ 2006-06-06 14:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Zang Roy-r61911
  Cc: linuxppc-dev list, Paul Mackerras, Yang Xin-Xin-r48390



> -----Original Message-----
> From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> Sent: Tuesday, June 06, 2006 6:17 AM
> To: Zang Roy-r61911
> Cc: Alexandre Bounine; Kumar Gala; linuxppc-dev list; Yang
> Xin-Xin-r48390; Paul Mackerras
> Subject: RE: [PATCH/2.6.17-rc4 4/10]Powerpc: Add tsi108 pic support
>=20
>=20
> On Tue, 2006-06-06 at 17:43 +0800, Zang Roy-r61911 wrote:
>=20
> > Update Tsi108 implementation of MPIC.
> > Any comment?=20
> >=20
> > Integrate Tundra Semiconductor tsi108 host bridge interrupt=20
> controller=20
> > to mpic arch.
>=20
> Looks much better :) Still a few things...=20
>

Sounds good. We are moving in right direction :)
=20
> > +	mpic =3D mpic_alloc(mpic_paddr,
> > +			MPIC_PRIMARY | MPIC_BIG_ENDIAN |=20
> MPIC_WANTS_RESET |
> > +			MPIC_SPV_EOI | MPIC_CASC_NOEOI |=20
> > +			MPIC_MOD_ID(MPIC_ID_TSI108),
> > +			0, /* num_sources used */
> > +			TSI108_IRQ_BASE,
> > +			0, /* num_sources used */
> > +			NR_IRQS - 4 /* XXXX */,
> > +			mpc7448_hpc2_pic_initsenses,
> > +			sizeof(mpc7448_hpc2_pic_initsenses),=20
> "Tsi108_PIC");
>=20
> That's a hell lot of new flags... I'm not sure we need that many or a
> single TSI108 one that encloses all the new ones. Also, I'm=20
> not sure we
> need that model ID encoding thing. Let's do things simple, besides, I
> don't want to encourage HW folks into doing the same kind of=20
> contraption
> in the future

More details in comments below.

>(btw, tell the TSI folks for me that they had a BAD BAD
> BAD idea to muck around with the base design that way, especially
> changing the register map in incompatible ways for no good reason).
>=20

Done!

> > +	/* Configure MPIC outputs to CPU0 */
> > +	tsi108_write_reg(TSI108_MPIC_OFFSET + 0x30c, 0);
> >  }
>=20
> It doesn't use the standard multiple processor outputs mecanism of
> MPIC ?
> =20
> > +static struct mpic_info mpic_infos[] =3D {
> > +	[0] =3D {	/* Original OpenPIC compatible MPIC */
> > +	.greg_base	=3D MPIC_GREG_BASE,
> > +	.greg_frr0	=3D MPIC_GREG_FEATURE_0,
> > +	.greg_config0	=3D MPIC_GREG_GLOBAL_CONF_0,
> > +	.greg_vendor_id	=3D MPIC_GREG_VENDOR_ID,
> > +	.greg_ipi_vp0	=3D MPIC_GREG_IPI_VECTOR_PRI_0,
> > +	.greg_ipi_stride	=3D MPIC_GREG_IPI_STRIDE,
> > +	.greg_spurious	=3D MPIC_GREG_SPURIOUS,
> > +	.greg_tfrr	=3D MPIC_GREG_TIMER_FREQ,
> > +
>=20
>    .../...
>=20
> It's a bit sad to have to go all the way to doing such tables, but I
> suspect it's probably the best way to handle it at this=20
> point.

> Send more
> nastygrams to the HW folks for me.
>=20

Done:)

> >  	mpic->num_sources =3D 0; /* so far */
> >  	mpic->senses =3D senses;
> >  	mpic->senses_count =3D senses_count;
> > +	mpic->hw_set =3D &mpic_infos[MPIC_GET_MOD_ID(flags)];
>=20
> Well... the model ID thing might not be that a bad idea in=20
> the end :) I
> need to think about it. I might have to deal with yet another=20
> MPIC that
> has another regiser map (yeah yeah, TSI aren't the only ones=20
> to not get
> it)...=20
>

I'll tell this to HW guys as well :)=20

>   .../...
>=20
> > @@ -963,7 +1043,7 @@ int mpic_get_one_irq(struct mpic *mpic,=20
> >  {
> >  	u32 irq;
> > =20
> > -	irq =3D mpic_cpu_read(MPIC_CPU_INTACK) & MPIC_VECPRI_VECTOR_MASK;
> > +	irq =3D mpic_cpu_read(mpic->hw_set->cpu_intack) &=20
> mpic->hw_set->irq_vpr_vector;
> >  #ifdef DEBUG_LOW
> >  	DBG("%s: get_one_irq(): %d\n", mpic->name, irq);
> >  #endif
> > @@ -972,11 +1052,18 @@ #ifdef DEBUG_LOW
> >  		DBG("%s: cascading ...\n", mpic->name);
> >  #endif
> >  		irq =3D mpic->cascade(regs, mpic->cascade_data);
> > -		mpic_eoi(mpic);
> > +#ifdef DEBUG_LOW
> > +		DBG("%s: cascaded irq: %d\n", mpic->name, irq);
> > +#endif
> > +		if (!(mpic->flags & MPIC_CASC_NOEOI))
> > +			mpic_eoi(mpic);
> >  		return irq;
> >  	}
>=20
> Can you tell me why you need the above ? (Why you aren't EOI'ing the
> cascade ?) Note that the cascade handling is going away from=20
> mpic anyway
> with the port to genirq that I'll publish later this week for=20
> 2.6.18 and
> it will almost be handled as a normal interrupt...
>=20

We have a level-signalled irq from the cascaded PCI interrupt =
controller. If I do EOI at=20
this time, level request will not have chance to be cleared (unless all =
PCI interrupts have
an SA_INTERRUPT flag) and result in recurring interrupts.=20

I chose to have an individual flag instead of checking model ID to avoid =
multiple checks within ISR (in case if we have more that one mpic =
version requiring this option). I also expect that it may be useful for =
any external level-signalling cascades connected to MPIC.     =20

> > -	if (unlikely(irq =3D=3D MPIC_VEC_SPURRIOUS))
> > +	if (unlikely(irq =3D=3D MPIC_VEC_SPURRIOUS)) {
> > +		if (mpic->flags & MPIC_SPV_EOI)
> > +			mpic_eoi(mpic);
> >  		return -1;
> > +	}
>=20
> I think the above thing could just test the model ID. It's=20
> unlikely that
> another implementation need the same "feature", so just test the model
> ID rather than adding a flag and if we ever have another=20
> model with the
> same "feature", then we'll go back to adding a flag :)
>=20

Motivation is the same as above - I just do not want to have multiple ID =
checks here. I agree that it is driven by mpic type (model ID) only. I =
can remove this one if you do not expect any
new "broken" MPICs on horizon. =20

> Cheers,
> Ben.
>=20
Thanks for your feedback,
Alex.
>=20
>=20

^ permalink raw reply

* [PATCH 2/2] powerpc: node-away dma allocations
From: Christoph Hellwig @ 2006-06-06 14:11 UTC (permalink / raw)
  To: linuxppc-dev

Make sure dma_alloc_coherent allocates memory from the local node.  This
is important on Cell where we avoid going through the slow cpu
interconnect.

Note:  I could only test this patch on Cell, it should be verified on
some pseries machine by thos that have the hardware.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/arch/powerpc/kernel/iommu.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/iommu.c	2006-04-25 15:53:07.000000000 +0200
+++ linux-2.6/arch/powerpc/kernel/iommu.c	2006-05-30 14:54:25.000000000 +0200
@@ -536,11 +536,12 @@
  * to the dma address (mapping) of the first page.
  */
 void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size,
-		dma_addr_t *dma_handle, unsigned long mask, gfp_t flag)
+		dma_addr_t *dma_handle, unsigned long mask, gfp_t flag, int node)
 {
 	void *ret = NULL;
 	dma_addr_t mapping;
 	unsigned int npages, order;
+	struct page *page;
 
 	size = PAGE_ALIGN(size);
 	npages = size >> PAGE_SHIFT;
@@ -560,9 +561,10 @@
 		return NULL;
 
 	/* Alloc enough pages (and possibly more) */
-	ret = (void *)__get_free_pages(flag, order);
-	if (!ret)
+	page = alloc_pages_node(flag, order, node);
+	if (!page)
 		return NULL;
+	ret = page_address(page);
 	memset(ret, 0, size);
 
 	/* Set up tces to cover the allocated range */
@@ -570,9 +572,9 @@
 			      mask >> PAGE_SHIFT, order);
 	if (mapping == DMA_ERROR_CODE) {
 		free_pages((unsigned long)ret, order);
-		ret = NULL;
-	} else
-		*dma_handle = mapping;
+		return NULL;
+	}
+	*dma_handle = mapping;
 	return ret;
 }
 
Index: linux-2.6/arch/powerpc/kernel/pci_iommu.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci_iommu.c	2006-04-25 15:53:07.000000000 +0200
+++ linux-2.6/arch/powerpc/kernel/pci_iommu.c	2006-05-30 14:55:18.000000000 +0200
@@ -86,7 +86,8 @@
 			   dma_addr_t *dma_handle, gfp_t flag)
 {
 	return iommu_alloc_coherent(devnode_table(hwdev), size, dma_handle,
-			device_to_mask(hwdev), flag);
+			device_to_mask(hwdev), flag,
+			pcibus_to_node(to_pci_dev(hwdev)->bus));
 }
 
 static void pci_iommu_free_coherent(struct device *hwdev, size_t size,
Index: linux-2.6/arch/powerpc/kernel/vio.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/vio.c	2006-04-25 15:53:07.000000000 +0200
+++ linux-2.6/arch/powerpc/kernel/vio.c	2006-05-30 14:54:38.000000000 +0200
@@ -229,7 +229,7 @@
 			   dma_addr_t *dma_handle, gfp_t flag)
 {
 	return iommu_alloc_coherent(to_vio_dev(dev)->iommu_table, size,
-			dma_handle, ~0ul, flag);
+			dma_handle, ~0ul, flag, -1);
 }
 
 static void vio_free_coherent(struct device *dev, size_t size,
Index: linux-2.6/include/asm-powerpc/iommu.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/iommu.h	2006-04-25 15:53:07.000000000 +0200
+++ linux-2.6/include/asm-powerpc/iommu.h	2006-05-30 14:55:45.000000000 +0200
@@ -76,7 +76,8 @@
 		int nelems, enum dma_data_direction direction);
 
 extern void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size,
-		dma_addr_t *dma_handle, unsigned long mask, gfp_t flag);
+		dma_addr_t *dma_handle, unsigned long mask,
+		gfp_t flag, int node);
 extern void iommu_free_coherent(struct iommu_table *tbl, size_t size,
 		void *vaddr, dma_addr_t dma_handle);
 extern dma_addr_t iommu_map_single(struct iommu_table *tbl, void *vaddr,

^ permalink raw reply

* [PATCH 1/2] powerpc: implement pcibus_to_node and pcibus_to_cpumask
From: Christoph Hellwig @ 2006-06-06 14:09 UTC (permalink / raw)
  To: linuxppc-dev

On 64bit powerpc we can find out what node a pci bus hangs off, so
implement the topology.h macros that export this information.

For 32bit this seems a little more difficult, but I don't know of 32bit
powerpc NUMA machines either, so let's leave it out for now.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/include/asm-powerpc/topology.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/topology.h	2006-05-02 16:26:14.000000000 +0200
+++ linux-2.6/include/asm-powerpc/topology.h	2006-05-30 14:42:18.000000000 +0200
@@ -32,8 +32,13 @@
 
 int of_node_to_nid(struct device_node *device);
 
+#ifdef CONFIG_PPC64
+#define pcibus_to_node(bus)	(of_node_to_nid(bus->sysdata))
+#define pcibus_to_cpumask(bus)	(node_to_cpumask(of_node_to_nid(bus->sysdata)))
+#else
 #define pcibus_to_node(node)    (-1)
 #define pcibus_to_cpumask(bus)	(cpu_online_map)
+#endif
 
 /* sched_domains SD_NODE_INIT for PPC64 machines */
 #define SD_NODE_INIT (struct sched_domain) {		\

^ permalink raw reply

* Re: [Alsa-devel] [RFC 4/8] snd-aoa: add i2sbus
From: Takashi Iwai @ 2006-06-06 14:00 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linuxppc-dev, alsa-devel
In-Reply-To: <1149592647.5928.49.camel@johannes.berg>

At Tue, 06 Jun 2006 13:17:27 +0200,
Johannes Berg wrote:
> 
> On Fri, 2006-06-02 at 16:23 +0200, Takashi Iwai wrote:
> > > +	if (I2S_CLOCK_SPEED_18MHz % rate == 0) {
> > > +		if ((I2S_CLOCK_SPEED_18MHz / rate) % mclk == 0) {
> > 
> > Equivalent with "I2S_CLOCK_SPEED_18MHZ % (rate * mclk) == 0" ?
> 
> Yeah, I guess, never really thought about that, just wrote it down the
> way I thought to do it :) That said, I think it's more readable if
> written that way, do you want me to change it regardless?

I found a single if is more readable (and good for compiler).

> > > +	/* well, we really should support scatter/gather DMA */
> > > +	/* FIXME FIXME FIXME: If this fails, we BUG() when the alsa layer
> > > +	 * later tries to allocate memory. Apparently we should be setting
> > > +	 * some device pointer for that ...
> > > +	 */
> > > +	snd_pcm_lib_preallocate_pages_for_all(
> > > +		dev->pcm, SNDRV_DMA_TYPE_DEV,
> > > +		snd_dma_pci_data(macio_get_pci_dev(i2sdev->macio)),
> > > +		64 * 1024, 64 * 1024);
> > 
> > Is the comment true?  Yes, you have to set the device pointer via
> > snd_pcm_lib_preallocate*().  But it must be OK even if preallocate
> > fails.
> 
> Hah, I don't know actually, I didn't know you set the pointer using this
> function, when I wrote the comment I just had forgotten the preallocate
> call!
> Does that mean that _preallocate_pages_for_all() has the side effect of
> setting the pointer? If so, imho that's pretty bad.

No, the only requirement is that you have to call snd_pcm_lib_malloc()
with proper type and assigned device pointer if you use
snd_pcm_lib_malloc() function.  (If not called, you've got an error
when compiled with debug option.)


Takashi

^ permalink raw reply

* [PATCH 5/5] Have ia64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-06-06 13:48 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
	bob.picco, ak, linux-mm
In-Reply-To: <20060606134710.21419.48239.sendpatchset@skynet.skynet.ie>


Size zones and holes in an architecture independent manner for ia64.


 arch/ia64/Kconfig          |    3 ++
 arch/ia64/mm/contig.c      |   60 +++++-----------------------------------
 arch/ia64/mm/discontig.c   |   41 ++++-----------------------
 arch/ia64/mm/init.c        |   12 ++++++++
 include/asm-ia64/meminit.h |    1 
 5 files changed, 30 insertions(+), 87 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/Kconfig linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/Kconfig
--- linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/Kconfig	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/Kconfig	2006-06-05 14:17:47.000000000 +0100
@@ -353,6 +353,9 @@ config NODES_SHIFT
 	  MAX_NUMNODES will be 2^(This value).
 	  If in doubt, use the default.
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 # VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent.
 # VIRTUAL_MEM_MAP has been retained for historical reasons.
 config VIRTUAL_MEM_MAP
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/contig.c
--- linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/contig.c	2006-06-05 14:17:47.000000000 +0100
@@ -26,10 +26,6 @@
 #include <asm/sections.h>
 #include <asm/mca.h>
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static unsigned long num_dma_physpages;
-#endif
-
 /**
  * show_mem - display a memory statistics summary
  *
@@ -210,18 +206,6 @@ count_pages (u64 start, u64 end, void *a
 	return 0;
 }
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static int
-count_dma_pages (u64 start, u64 end, void *arg)
-{
-	unsigned long *count = arg;
-
-	if (start < MAX_DMA_ADDRESS)
-		*count += (min(end, MAX_DMA_ADDRESS) - start) >> PAGE_SHIFT;
-	return 0;
-}
-#endif
-
 /*
  * Set up the page tables.
  */
@@ -230,47 +214,24 @@ void __init
 paging_init (void)
 {
 	unsigned long max_dma;
-	unsigned long zones_size[MAX_NR_ZONES];
 #ifdef CONFIG_VIRTUAL_MEM_MAP
-	unsigned long zholes_size[MAX_NR_ZONES];
+	unsigned long nid = 0;
 	unsigned long max_gap;
 #endif
 
-	/* initialize mem_map[] */
-
-	memset(zones_size, 0, sizeof(zones_size));
-
 	num_physpages = 0;
 	efi_memmap_walk(count_pages, &num_physpages);
 
 	max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
 
 #ifdef CONFIG_VIRTUAL_MEM_MAP
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	num_dma_physpages = 0;
-	efi_memmap_walk(count_dma_pages, &num_dma_physpages);
-
-	if (max_low_pfn < max_dma) {
-		zones_size[ZONE_DMA] = max_low_pfn;
-		zholes_size[ZONE_DMA] = max_low_pfn - num_dma_physpages;
-	} else {
-		zones_size[ZONE_DMA] = max_dma;
-		zholes_size[ZONE_DMA] = max_dma - num_dma_physpages;
-		if (num_physpages > num_dma_physpages) {
-			zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
-			zholes_size[ZONE_NORMAL] =
-				((max_low_pfn - max_dma) -
-				 (num_physpages - num_dma_physpages));
-		}
-	}
-
 	max_gap = 0;
+	efi_memmap_walk(register_active_ranges, &nid);
 	efi_memmap_walk(find_largest_hole, (u64 *)&max_gap);
 	if (max_gap < LARGE_GAP) {
 		vmem_map = (struct page *) 0;
-		free_area_init_node(0, NODE_DATA(0), zones_size, 0,
-				    zholes_size);
+		free_area_init_nodes(max_dma, max_dma,
+				max_low_pfn, max_low_pfn);
 	} else {
 		unsigned long map_size;
 
@@ -282,19 +243,14 @@ paging_init (void)
 		efi_memmap_walk(create_mem_map_page_table, NULL);
 
 		NODE_DATA(0)->node_mem_map = vmem_map;
-		free_area_init_node(0, NODE_DATA(0), zones_size,
-				    0, zholes_size);
+		free_area_init_nodes(max_dma, max_dma,
+				max_low_pfn, max_low_pfn);
 
 		printk("Virtual mem_map starts at 0x%p\n", mem_map);
 	}
 #else /* !CONFIG_VIRTUAL_MEM_MAP */
-	if (max_low_pfn < max_dma)
-		zones_size[ZONE_DMA] = max_low_pfn;
-	else {
-		zones_size[ZONE_DMA] = max_dma;
-		zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
-	}
-	free_area_init(zones_size);
+	add_active_range(0, 0, max_low_pfn);
+	free_area_init_nodes(max_dma, max_dma, max_low_pfn, max_low_pfn);
 #endif /* !CONFIG_VIRTUAL_MEM_MAP */
 	zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
 }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c
--- linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c	2006-06-05 14:17:47.000000000 +0100
@@ -703,6 +703,7 @@ static __init int count_node_pages(unsig
 {
 	unsigned long end = start + len;
 
+	add_active_range(node, start >> PAGE_SHIFT, end >> PAGE_SHIFT);
 	mem_data[node].num_physpages += len >> PAGE_SHIFT;
 	if (start <= __pa(MAX_DMA_ADDRESS))
 		mem_data[node].num_dma_physpages +=
@@ -727,9 +728,8 @@ static __init int count_node_pages(unsig
 void __init paging_init(void)
 {
 	unsigned long max_dma;
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
 	unsigned long pfn_offset = 0;
+	unsigned long max_pfn = 0;
 	int node;
 
 	max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
@@ -746,47 +746,18 @@ void __init paging_init(void)
 #endif
 
 	for_each_online_node(node) {
-		memset(zones_size, 0, sizeof(zones_size));
-		memset(zholes_size, 0, sizeof(zholes_size));
-
 		num_physpages += mem_data[node].num_physpages;
-
-		if (mem_data[node].min_pfn >= max_dma) {
-			/* All of this node's memory is above ZONE_DMA */
-			zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn -
-				mem_data[node].num_physpages;
-		} else if (mem_data[node].max_pfn < max_dma) {
-			/* All of this node's memory is in ZONE_DMA */
-			zones_size[ZONE_DMA] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_DMA] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn -
-				mem_data[node].num_dma_physpages;
-		} else {
-			/* This node has memory in both zones */
-			zones_size[ZONE_DMA] = max_dma -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] -
-				mem_data[node].num_dma_physpages;
-			zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				max_dma;
-			zholes_size[ZONE_NORMAL] = zones_size[ZONE_NORMAL] -
-				(mem_data[node].num_physpages -
-				 mem_data[node].num_dma_physpages);
-		}
-
 		pfn_offset = mem_data[node].min_pfn;
 
 #ifdef CONFIG_VIRTUAL_MEM_MAP
 		NODE_DATA(node)->node_mem_map = vmem_map + pfn_offset;
 #endif
-		free_area_init_node(node, NODE_DATA(node), zones_size,
-				    pfn_offset, zholes_size);
+		if (mem_data[node].max_pfn > max_pfn)
+			max_pfn = mem_data[node].max_pfn;
 	}
 
+	free_area_init_nodes(max_dma, max_dma, max_pfn, max_pfn);
+
 	zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
 }
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/init.c linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/init.c
--- linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/ia64/mm/init.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/arch/ia64/mm/init.c	2006-06-05 14:17:47.000000000 +0100
@@ -539,6 +539,18 @@ find_largest_hole (u64 start, u64 end, v
 	last_end = end;
 	return 0;
 }
+
+int __init
+register_active_ranges(u64 start, u64 end, void *nid)
+{
+	BUG_ON(nid == NULL);
+	BUG_ON(*(unsigned long *)nid >= MAX_NUMNODES);
+
+	add_active_range(*(unsigned long *)nid,
+				__pa(start) >> PAGE_SHIFT,
+				__pa(end) >> PAGE_SHIFT);
+	return 0;
+}
 #endif /* CONFIG_VIRTUAL_MEM_MAP */
 
 static int __init
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/include/asm-ia64/meminit.h
--- linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h	2006-06-05 14:12:51.000000000 +0100
+++ linux-2.6.17-rc5-mm3-105-ia64_use_init_nodes/include/asm-ia64/meminit.h	2006-06-05 14:17:47.000000000 +0100
@@ -55,6 +55,7 @@ extern void efi_memmap_init(unsigned lon
   extern unsigned long vmalloc_end;
   extern struct page *vmem_map;
   extern int find_largest_hole (u64 start, u64 end, void *arg);
+  extern int register_active_ranges (u64 start, u64 end, void *arg);
   extern int create_mem_map_page_table (u64 start, u64 end, void *arg);
 #endif
 

^ permalink raw reply

* [PATCH 4/5] Have x86_64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-06-06 13:48 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
	linux-kernel, linuxppc-dev
In-Reply-To: <20060606134710.21419.48239.sendpatchset@skynet.skynet.ie>


Size zones and holes in an architecture independent manner for x86_64.


 arch/x86_64/Kconfig         |    3 
 arch/x86_64/kernel/e820.c   |  125 ++++++++++++++-------------------------
 arch/x86_64/kernel/setup.c  |    7 +-
 arch/x86_64/mm/init.c       |   62 -------------------
 arch/x86_64/mm/k8topology.c |    3 
 arch/x86_64/mm/numa.c       |   18 ++---
 arch/x86_64/mm/srat.c       |   11 ++-
 include/asm-x86_64/e820.h   |    5 -
 include/asm-x86_64/proto.h  |    2 
 9 files changed, 79 insertions(+), 157 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/Kconfig linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/Kconfig
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/Kconfig	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/Kconfig	2006-06-05 14:16:49.000000000 +0100
@@ -73,6 +73,9 @@ config ARCH_MAY_HAVE_PC_FDC
 	bool
 	default y
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 config DMI
 	bool
 	default y
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c	2006-06-05 14:16:49.000000000 +0100
@@ -17,6 +17,7 @@
 #include <linux/string.h>
 #include <linux/kexec.h>
 #include <linux/module.h>
+#include <linux/mm.h>
 
 #include <asm/page.h>
 #include <asm/e820.h>
@@ -160,58 +161,14 @@ unsigned long __init find_e820_area(unsi
 	return -1UL;		
 } 
 
-/* 
- * Free bootmem based on the e820 table for a node.
- */
-void __init e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end)
-{
-	int i;
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i]; 
-		unsigned long last, addr;
-
-		if (ei->type != E820_RAM || 
-		    ei->addr+ei->size <= start || 
-		    ei->addr >= end)
-			continue;
-
-		addr = round_up(ei->addr, PAGE_SIZE);
-		if (addr < start) 
-			addr = start;
-
-		last = round_down(ei->addr + ei->size, PAGE_SIZE); 
-		if (last >= end)
-			last = end; 
-
-		if (last > addr && last-addr >= PAGE_SIZE)
-			free_bootmem_node(pgdat, addr, last-addr);
-	}
-}
-
 /*
  * Find the highest page frame number we have available
  */
 unsigned long __init e820_end_of_ram(void)
 {
-	int i;
 	unsigned long end_pfn = 0;
 	
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i]; 
-		unsigned long start, end;
-
-		start = round_up(ei->addr, PAGE_SIZE); 
-		end = round_down(ei->addr + ei->size, PAGE_SIZE); 
-		if (start >= end)
-			continue;
-		if (ei->type == E820_RAM) { 
-		if (end > end_pfn<<PAGE_SHIFT)
-			end_pfn = end>>PAGE_SHIFT;
-		} else { 
-			if (end > end_pfn_map<<PAGE_SHIFT) 
-				end_pfn_map = end>>PAGE_SHIFT;
-		} 
-	}
+	end_pfn = find_max_pfn_with_active_regions();
 
 	if (end_pfn > end_pfn_map) 
 		end_pfn_map = end_pfn;
@@ -222,43 +179,10 @@ unsigned long __init e820_end_of_ram(voi
 	if (end_pfn > end_pfn_map) 
 		end_pfn = end_pfn_map; 
 
+	printk("end_pfn_map = %lu\n", end_pfn_map);
 	return end_pfn;	
 }
 
-/* 
- * Compute how much memory is missing in a range.
- * Unlike the other functions in this file the arguments are in page numbers.
- */
-unsigned long __init
-e820_hole_size(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long ram = 0;
-	unsigned long start = start_pfn << PAGE_SHIFT;
-	unsigned long end = end_pfn << PAGE_SHIFT;
-	int i;
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i];
-		unsigned long last, addr;
-
-		if (ei->type != E820_RAM ||
-		    ei->addr+ei->size <= start ||
-		    ei->addr >= end)
-			continue;
-
-		addr = round_up(ei->addr, PAGE_SIZE);
-		if (addr < start)
-			addr = start;
-
-		last = round_down(ei->addr + ei->size, PAGE_SIZE);
-		if (last >= end)
-			last = end;
-
-		if (last > addr)
-			ram += last - addr;
-	}
-	return ((end - start) - ram) >> PAGE_SHIFT;
-}
-
 /*
  * Mark e820 reserved areas as busy for the resource manager.
  */
@@ -293,6 +217,49 @@ void __init e820_reserve_resources(void)
 	}
 }
 
+/* Walk the e820 map and register active regions within a node */
+void __init
+e820_register_active_regions(int nid, unsigned long start_pfn,
+							unsigned long end_pfn)
+{
+	int i;
+	unsigned long ei_startpfn, ei_endpfn;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		ei_startpfn = round_up(ei->addr, PAGE_SIZE) >> PAGE_SHIFT;
+		ei_endpfn = round_down(ei->addr + ei->size, PAGE_SIZE)
+								>> PAGE_SHIFT;
+
+		/* Skip map entries smaller than a page */
+		if (ei_startpfn > ei_endpfn)
+			continue;
+
+		/* Check if end_pfn_map should be updated */
+		if (ei->type != E820_RAM && ei_endpfn > end_pfn_map)
+			end_pfn_map = ei_endpfn;
+
+		/* Skip if map is outside the node */
+		if (ei->type != E820_RAM ||
+				ei_endpfn <= start_pfn ||
+				ei_startpfn >= end_pfn)
+			continue;
+
+		/* Check for overlaps */
+		if (ei_startpfn < start_pfn)
+			ei_startpfn = start_pfn;
+		if (ei_endpfn > end_pfn)
+			ei_endpfn = end_pfn;
+
+		/* Obey end_user_pfn to save on memmap */
+		if (ei_startpfn >= end_user_pfn)
+			continue;
+		if (ei_endpfn > end_user_pfn)
+			ei_endpfn = end_user_pfn;
+
+		add_active_range(nid, ei_startpfn, ei_endpfn);
+	}
+}
+
 /* 
  * Add a memory region to the kernel e820 map.
  */ 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c	2006-06-05 14:16:49.000000000 +0100
@@ -466,7 +466,8 @@ contig_initmem_init(unsigned long start_
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n",bootmap_size);
 	bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);
-	e820_bootmem_free(NODE_DATA(0), 0, end_pfn << PAGE_SHIFT);
+	e820_register_active_regions(0, start_pfn, end_pfn);
+	free_bootmem_with_active_regions(0, end_pfn);
 	reserve_bootmem(bootmap, bootmap_size);
 } 
 #endif
@@ -640,6 +641,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_identify_cpu(&boot_cpu_data);
 
+	e820_register_active_regions(0, 0, -1UL);
 	/*
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
@@ -665,6 +667,9 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_table_init();
 #endif
 
+	/* Remove active ranges so rediscovery with NUMA-awareness happens */
+	remove_all_active_ranges();
+
 #ifdef CONFIG_ACPI_NUMA
 	/*
 	 * Parse SRAT to discover nodes.
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/init.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/init.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c	2006-06-05 14:16:49.000000000 +0100
@@ -404,69 +404,12 @@ void __cpuinit zap_low_mappings(int cpu)
 	__flush_tlb_all();
 }
 
-/* Compute zone sizes for the DMA and DMA32 zones in a node. */
-__init void
-size_zones(unsigned long *z, unsigned long *h,
-	   unsigned long start_pfn, unsigned long end_pfn)
-{
- 	int i;
- 	unsigned long w;
-
- 	for (i = 0; i < MAX_NR_ZONES; i++)
- 		z[i] = 0;
-
- 	if (start_pfn < MAX_DMA_PFN)
- 		z[ZONE_DMA] = MAX_DMA_PFN - start_pfn;
- 	if (start_pfn < MAX_DMA32_PFN) {
- 		unsigned long dma32_pfn = MAX_DMA32_PFN;
- 		if (dma32_pfn > end_pfn)
- 			dma32_pfn = end_pfn;
- 		z[ZONE_DMA32] = dma32_pfn - start_pfn;
- 	}
- 	z[ZONE_NORMAL] = end_pfn - start_pfn;
-
- 	/* Remove lower zones from higher ones. */
- 	w = 0;
- 	for (i = 0; i < MAX_NR_ZONES; i++) {
- 		if (z[i])
- 			z[i] -= w;
- 	        w += z[i];
-	}
-
-	/* Compute holes */
-	w = start_pfn;
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		unsigned long s = w;
-		w += z[i];
-		h[i] = e820_hole_size(s, w);
-	}
-
-	/* Add the space pace needed for mem_map to the holes too. */
-	for (i = 0; i < MAX_NR_ZONES; i++)
-		h[i] += (z[i] * sizeof(struct page)) / PAGE_SIZE;
-
-	/* The 16MB DMA zone has the kernel and other misc mappings.
- 	   Account them too */
-	if (h[ZONE_DMA]) {
-		h[ZONE_DMA] += dma_reserve;
-		if (h[ZONE_DMA] >= z[ZONE_DMA]) {
-			printk(KERN_WARNING
-				"Kernel too large and filling up ZONE_DMA?\n");
-			h[ZONE_DMA] = z[ZONE_DMA];
-		}
-	}
-}
-
 #ifndef CONFIG_NUMA
 void __init paging_init(void)
 {
-	unsigned long zones[MAX_NR_ZONES], holes[MAX_NR_ZONES];
-
 	memory_present(0, 0, end_pfn);
 	sparse_init();
-	size_zones(zones, holes, 0, end_pfn);
-	free_area_init_node(0, NODE_DATA(0), zones,
-			    __pa(PAGE_OFFSET) >> PAGE_SHIFT, holes);
+	free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
 }
 #endif
 
@@ -615,7 +558,8 @@ void __init mem_init(void)
 #else
 	totalram_pages = free_all_bootmem();
 #endif
-	reservedpages = end_pfn - totalram_pages - e820_hole_size(0, end_pfn);
+	reservedpages = end_pfn - totalram_pages -
+					absent_pages_in_range(0, end_pfn);
 
 	after_bootmem = 1;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c	2006-05-25 02:50:17.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c	2006-06-05 14:16:49.000000000 +0100
@@ -146,6 +146,9 @@ int __init k8_scan_nodes(unsigned long s
 		
 		nodes[nodeid].start = base; 
 		nodes[nodeid].end = limit;
+		e820_register_active_regions(nodeid,
+				nodes[nodeid].start >> PAGE_SHIFT,
+				nodes[nodeid].end >> PAGE_SHIFT);
 
 		prevbase = base;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/numa.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/numa.c	2006-05-25 02:50:17.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c	2006-06-05 14:16:49.000000000 +0100
@@ -161,7 +161,7 @@ void __init setup_node_bootmem(int nodei
 					 bootmap_start >> PAGE_SHIFT, 
 					 start_pfn, end_pfn); 
 
-	e820_bootmem_free(NODE_DATA(nodeid), start, end);
+	free_bootmem_with_active_regions(nodeid, end);
 
 	reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size); 
 	reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT);
@@ -175,13 +175,11 @@ void __init setup_node_bootmem(int nodei
 void __init setup_node_zones(int nodeid)
 { 
 	unsigned long start_pfn, end_pfn, memmapsize, limit;
-	unsigned long zones[MAX_NR_ZONES];
-	unsigned long holes[MAX_NR_ZONES];
 
  	start_pfn = node_start_pfn(nodeid);
  	end_pfn = node_end_pfn(nodeid);
 
-	Dprintk(KERN_INFO "Setting up node %d %lx-%lx\n",
+	Dprintk(KERN_INFO "Setting up memmap for node %d %lx-%lx\n",
 		nodeid, start_pfn, end_pfn);
 
 	/* Try to allocate mem_map at end to not fill up precious <4GB
@@ -195,10 +193,6 @@ void __init setup_node_zones(int nodeid)
 				round_down(limit - memmapsize, PAGE_SIZE), 
 				limit);
 #endif
-
-	size_zones(zones, holes, start_pfn, end_pfn);
-	free_area_init_node(nodeid, NODE_DATA(nodeid), zones,
-			    start_pfn, holes);
 } 
 
 void __init numa_init_array(void)
@@ -259,8 +253,11 @@ static int numa_emulation(unsigned long 
  		printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
  		return -1;
  	}
- 	for_each_online_node(i)
+ 	for_each_online_node(i) {
+		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+						nodes[i].end >> PAGE_SHIFT);
  		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+	}
  	numa_init_array();
  	return 0;
 }
@@ -299,6 +296,7 @@ void __init numa_initmem_init(unsigned l
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
 	node_to_cpumask[0] = cpumask_of_cpu(0);
+	e820_register_active_regions(0, start_pfn, end_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
 
@@ -346,6 +344,8 @@ void __init paging_init(void)
 	for_each_online_node(i) {
 		setup_node_zones(i); 
 	}
+
+	free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
 } 
 
 /* [numa=off] */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/srat.c linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/x86_64/mm/srat.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c	2006-06-05 14:16:49.000000000 +0100
@@ -91,6 +91,7 @@ static __init void bad_srat(void)
 		apicid_to_node[i] = NUMA_NO_NODE;
 	for (i = 0; i < MAX_NUMNODES; i++)
 		nodes_add[i].start = nodes[i].end = 0;
+	remove_all_active_ranges();
 }
 
 static __init inline int srat_disabled(void)
@@ -173,7 +174,7 @@ static int hotadd_enough_memory(struct b
 
 	if (mem < 0)
 		return 0;
-	allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
+	allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * PAGE_SIZE;
 	allowed = (allowed / 100) * hotadd_percent;
 	if (allocated + mem > allowed) {
 		unsigned long range;
@@ -223,7 +224,7 @@ static int reserve_hotadd(int node, unsi
 	}
 
 	/* This check might be a bit too strict, but I'm keeping it for now. */
-	if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
+	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
 		printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
 		return -1;
 	}
@@ -317,6 +318,8 @@ acpi_numa_memory_affinity_init(struct ac
 
 	printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
 	       nd->start, nd->end);
+	e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
+						nd->end >> PAGE_SHIFT);
 
 #ifdef RESERVE_HOTADD
  	if (ma->flags.hot_pluggable && reserve_hotadd(node, start, end) < 0) {
@@ -341,13 +344,13 @@ static int nodes_cover_memory(void)
 		unsigned long s = nodes[i].start >> PAGE_SHIFT;
 		unsigned long e = nodes[i].end >> PAGE_SHIFT;
 		pxmram += e - s;
-		pxmram -= e820_hole_size(s, e);
+		pxmram -= absent_pages_in_range(s, e);
 		pxmram -= nodes_add[i].end - nodes_add[i].start;
 		if ((long)pxmram < 0)
 			pxmram = 0;
 	}
 
-	e820ram = end_pfn - e820_hole_size(0, end_pfn);
+	e820ram = end_pfn - absent_pages_in_range(0, end_pfn);
 	/* We seem to lose 3 pages somewhere. Allow a bit of slack. */
 	if ((long)(e820ram - pxmram) >= 1*1024*1024) {
 		printk(KERN_ERR
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/include/asm-x86_64/e820.h linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/include/asm-x86_64/e820.h	2006-05-25 02:50:17.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h	2006-06-05 14:16:49.000000000 +0100
@@ -50,10 +50,9 @@ extern void e820_print_map(char *who);
 extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
 extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);
 
-extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end);
 extern void e820_setup_gap(void);
-extern unsigned long e820_hole_size(unsigned long start_pfn,
-				    unsigned long end_pfn);
+extern void e820_register_active_regions(int nid,
+				unsigned long start_pfn, unsigned long end_pfn);
 
 extern void __init parse_memopt(char *p, char **end);
 extern void __init parse_memmapopt(char *p, char **end);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/include/asm-x86_64/proto.h linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h
--- linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/include/asm-x86_64/proto.h	2006-06-05 14:12:51.000000000 +0100
+++ linux-2.6.17-rc5-mm3-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h	2006-06-05 14:16:49.000000000 +0100
@@ -24,8 +24,6 @@ extern void mtrr_bp_init(void);
 #define mtrr_bp_init() do {} while (0)
 #endif
 extern void init_memory_mapping(unsigned long start, unsigned long end);
-extern void size_zones(unsigned long *z, unsigned long *h,
-			unsigned long start_pfn, unsigned long end_pfn);
 
 extern void system_call(void); 
 extern int kernel_syscall(void);

^ permalink raw reply

* [PATCH 3/5] Have x86 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-06-06 13:48 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
	bob.picco, ak, linux-mm
In-Reply-To: <20060606134710.21419.48239.sendpatchset@skynet.skynet.ie>


Size zones and holes in an architecture independent manner for x86.


 Kconfig        |    8 +---
 kernel/setup.c |   19 +++------
 kernel/srat.c  |  100 +---------------------------------------------------
 mm/discontig.c |   65 +++++++--------------------------
 4 files changed, 25 insertions(+), 167 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/Kconfig linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/Kconfig
--- linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/Kconfig	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/Kconfig	2006-06-05 14:15:57.000000000 +0100
@@ -592,12 +592,10 @@ config ARCH_SELECT_MEMORY_MODEL
 config ARCH_ALIGNED_ZONE_BOUNDARIES
 	def_bool y
 
-source "mm/Kconfig"
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
 
-config HAVE_ARCH_EARLY_PFN_TO_NID
-	bool
-	default y
-	depends on NUMA
+source "mm/Kconfig"
 
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/kernel/setup.c
--- linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/kernel/setup.c	2006-06-05 14:15:57.000000000 +0100
@@ -1206,22 +1206,15 @@ static unsigned long __init setup_memory
 
 void __init zone_sizes_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
-	unsigned int max_dma, low;
+	unsigned int max_dma;
+#ifndef CONFIG_HIGHMEM
+	unsigned long highend_pfn = max_low_pfn;
+#endif
 
 	max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-	low = max_low_pfn;
 
-	if (low < max_dma)
-		zones_size[ZONE_DMA] = low;
-	else {
-		zones_size[ZONE_DMA] = max_dma;
-		zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
-		zones_size[ZONE_HIGHMEM] = highend_pfn - low;
-#endif
-	}
-	free_area_init(zones_size);
+	add_active_range(0, 0, highend_pfn);
+	free_area_init_nodes(max_dma, max_dma, max_low_pfn, highend_pfn);
 }
 #else
 extern unsigned long __init setup_memory(void);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/kernel/srat.c
--- linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/kernel/srat.c	2006-06-05 14:15:57.000000000 +0100
@@ -55,8 +55,6 @@ struct node_memory_chunk_s {
 static struct node_memory_chunk_s node_memory_chunk[MAXCHUNKS];
 
 static int num_memory_chunks;		/* total number of memory chunks */
-static int zholes_size_init;
-static unsigned long zholes_size[MAX_NUMNODES * MAX_NR_ZONES];
 
 extern void * boot_ioremap(unsigned long, unsigned long);
 
@@ -136,50 +134,6 @@ static void __init parse_memory_affinity
 		 "enabled and removable" : "enabled" ) );
 }
 
-#if MAX_NR_ZONES != 4
-#error "MAX_NR_ZONES != 4, chunk_to_zone requires review"
-#endif
-/* Take a chunk of pages from page frame cstart to cend and count the number
- * of pages in each zone, returned via zones[].
- */
-static __init void chunk_to_zones(unsigned long cstart, unsigned long cend, 
-		unsigned long *zones)
-{
-	unsigned long max_dma;
-	extern unsigned long max_low_pfn;
-
-	int z;
-	unsigned long rend;
-
-	/* FIXME: MAX_DMA_ADDRESS and max_low_pfn are trying to provide
-	 * similarly scoped information and should be handled in a consistant
-	 * manner.
-	 */
-	max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
-	/* Split the hole into the zones in which it falls.  Repeatedly
-	 * take the segment in which the remaining hole starts, round it
-	 * to the end of that zone.
-	 */
-	memset(zones, 0, MAX_NR_ZONES * sizeof(long));
-	while (cstart < cend) {
-		if (cstart < max_dma) {
-			z = ZONE_DMA;
-			rend = (cend < max_dma)? cend : max_dma;
-
-		} else if (cstart < max_low_pfn) {
-			z = ZONE_NORMAL;
-			rend = (cend < max_low_pfn)? cend : max_low_pfn;
-
-		} else {
-			z = ZONE_HIGHMEM;
-			rend = cend;
-		}
-		zones[z] += rend - cstart;
-		cstart = rend;
-	}
-}
-
 /*
  * The SRAT table always lists ascending addresses, so can always
  * assume that the first "start" address that you see is the real
@@ -224,7 +178,6 @@ static int __init acpi20_parse_srat(stru
 
 	memset(pxm_bitmap, 0, sizeof(pxm_bitmap));	/* init proximity domain bitmap */
 	memset(node_memory_chunk, 0, sizeof(node_memory_chunk));
-	memset(zholes_size, 0, sizeof(zholes_size));
 
 	num_memory_chunks = 0;
 	while (p < end) {
@@ -288,6 +241,7 @@ static int __init acpi20_parse_srat(stru
 		printk("chunk %d nid %d start_pfn %08lx end_pfn %08lx\n",
 		       j, chunk->nid, chunk->start_pfn, chunk->end_pfn);
 		node_read_chunk(chunk->nid, chunk);
+		add_active_range(chunk->nid, chunk->start_pfn, chunk->end_pfn);
 	}
  
 	for_each_online_node(nid) {
@@ -404,57 +358,7 @@ int __init get_memcfg_from_srat(void)
 		return acpi20_parse_srat((struct acpi_table_srat *)header);
 	}
 out_err:
+	remove_all_active_ranges();
 	printk("failed to get NUMA memory information from SRAT table\n");
 	return 0;
 }
-
-/* For each node run the memory list to determine whether there are
- * any memory holes.  For each hole determine which ZONE they fall
- * into.
- *
- * NOTE#1: this requires knowledge of the zone boundries and so
- * _cannot_ be performed before those are calculated in setup_memory.
- * 
- * NOTE#2: we rely on the fact that the memory chunks are ordered by
- * start pfn number during setup.
- */
-static void __init get_zholes_init(void)
-{
-	int nid;
-	int c;
-	int first;
-	unsigned long end = 0;
-
-	for_each_online_node(nid) {
-		first = 1;
-		for (c = 0; c < num_memory_chunks; c++){
-			if (node_memory_chunk[c].nid == nid) {
-				if (first) {
-					end = node_memory_chunk[c].end_pfn;
-					first = 0;
-
-				} else {
-					/* Record any gap between this chunk
-					 * and the previous chunk on this node
-					 * against the zones it spans.
-					 */
-					chunk_to_zones(end,
-						node_memory_chunk[c].start_pfn,
-						&zholes_size[nid * MAX_NR_ZONES]);
-				}
-			}
-		}
-	}
-}
-
-unsigned long * __init get_zholes_size(int nid)
-{
-	if (!zholes_size_init) {
-		zholes_size_init++;
-		get_zholes_init();
-	}
-	if (nid >= MAX_NUMNODES || !node_online(nid))
-		printk("%s: nid = %d is invalid/offline. num_online_nodes = %d",
-		       __FUNCTION__, nid, num_online_nodes());
-	return &zholes_size[nid * MAX_NR_ZONES];
-}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/mm/discontig.c
--- linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-103-x86_use_init_nodes/arch/i386/mm/discontig.c	2006-06-05 14:15:57.000000000 +0100
@@ -157,21 +157,6 @@ static void __init find_max_pfn_node(int
 		BUG();
 }
 
-/* Find the owning node for a pfn. */
-int early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	for_each_node(nid) {
-		if (node_end_pfn[nid] == 0)
-			break;
-		if (node_start_pfn[nid] <= pfn && node_end_pfn[nid] >= pfn)
-			return nid;
-	}
-
-	return 0;
-}
-
 /* 
  * Allocate memory for the pg_data_t for this node via a crude pre-bootmem
  * method.  For node zero take this from the bottom of memory, for
@@ -227,6 +212,8 @@ static unsigned long calculate_numa_rema
 	unsigned long pfn;
 
 	for_each_online_node(nid) {
+		unsigned old_end_pfn = node_end_pfn[nid];
+
 		/*
 		 * The acpi/srat node info can show hot-add memroy zones
 		 * where memory could be added but not currently present.
@@ -276,6 +263,7 @@ static unsigned long calculate_numa_rema
 
 		node_end_pfn[nid] -= size;
 		node_remap_start_pfn[nid] = node_end_pfn[nid];
+		shrink_active_range(nid, old_end_pfn, node_end_pfn[nid]);
 	}
 	printk("Reserving total of %ld pages for numa KVA remap\n",
 			reserve_pages);
@@ -355,45 +343,20 @@ unsigned long __init setup_memory(void)
 void __init zone_sizes_init(void)
 {
 	int nid;
+	unsigned long max_dma_pfn;
 
-
-	for_each_online_node(nid) {
-		unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
-		unsigned long *zholes_size;
-		unsigned int max_dma;
-
-		unsigned long low = max_low_pfn;
-		unsigned long start = node_start_pfn[nid];
-		unsigned long high = node_end_pfn[nid];
-
-		max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
-		if (node_has_online_mem(nid)){
-			if (start > low) {
-#ifdef CONFIG_HIGHMEM
-				BUG_ON(start > high);
-				zones_size[ZONE_HIGHMEM] = high - start;
-#endif
-			} else {
-				if (low < max_dma)
-					zones_size[ZONE_DMA] = low;
-				else {
-					BUG_ON(max_dma > low);
-					BUG_ON(low > high);
-					zones_size[ZONE_DMA] = max_dma;
-					zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
-					zones_size[ZONE_HIGHMEM] = high - low;
-#endif
-				}
-			}
+	/* If SRAT has not registered memory, register it now */
+	if (find_max_pfn_with_active_regions() == 0) {
+		for_each_online_node(nid) {
+			if (node_has_online_mem(nid))
+				add_active_range(nid, node_start_pfn[nid],
+							node_end_pfn[nid]);
 		}
-
-		zholes_size = get_zholes_size(nid);
-
-		free_area_init_node(nid, NODE_DATA(nid), zones_size, start,
-				zholes_size);
 	}
+
+	max_dma_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
+	free_area_init_nodes(max_dma_pfn, max_dma_pfn,
+						max_low_pfn, highend_pfn);
 	return;
 }
 

^ permalink raw reply

* [PATCH 2/5] Have Power use add_active_range() and free_area_init_nodes()
From: Mel Gorman @ 2006-06-06 13:47 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
	linux-kernel, linuxppc-dev
In-Reply-To: <20060606134710.21419.48239.sendpatchset@skynet.skynet.ie>


Size zones and holes in an architecture independent manner for Power.


 powerpc/Kconfig   |    7 --
 powerpc/mm/mem.c  |   53 ++++++----------
 powerpc/mm/numa.c |  157 ++++---------------------------------------------
 ppc/Kconfig       |    3 
 ppc/mm/init.c     |   26 ++++----
 5 files changed, 56 insertions(+), 190 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/Kconfig linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/Kconfig
--- linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/Kconfig	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/Kconfig	2006-06-05 14:15:06.000000000 +0100
@@ -692,11 +692,10 @@ config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
 	depends on SMP && PPC_PSERIES
 
-source "mm/Kconfig"
-
-config HAVE_ARCH_EARLY_PFN_TO_NID
+config ARCH_POPULATES_NODE_MAP
 	def_bool y
-	depends on NEED_MULTIPLE_NODES
+
+source "mm/Kconfig"
 
 config ARCH_MEMORY_PROBE
 	def_bool y
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c
--- linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c	2006-06-05 14:15:06.000000000 +0100
@@ -257,20 +257,22 @@ void __init do_init_bootmem(void)
 
 	boot_mapsize = init_bootmem(start >> PAGE_SHIFT, total_pages);
 
+	/* Add active regions with valid PFNs */
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		unsigned long start_pfn, end_pfn;
+		start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+		end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+		add_active_range(0, start_pfn, end_pfn);
+	}
+
 	/* Add all physical memory to the bootmem map, mark each area
 	 * present.
 	 */
-	for (i = 0; i < lmb.memory.cnt; i++) {
-		unsigned long base = lmb.memory.region[i].base;
-		unsigned long size = lmb_size_bytes(&lmb.memory, i);
 #ifdef CONFIG_HIGHMEM
-		if (base >= total_lowmem)
-			continue;
-		if (base + size > total_lowmem)
-			size = total_lowmem - base;
+	free_bootmem_with_active_regions(0, total_lowmem >> PAGE_SHIFT);
+#else
+	free_bootmem_with_active_regions(0, max_pfn);
 #endif
-		free_bootmem(base, size);
-	}
 
 	/* reserve the sections we're already using */
 	for (i = 0; i < lmb.reserved.cnt; i++)
@@ -278,9 +280,8 @@ void __init do_init_bootmem(void)
 				lmb_size_bytes(&lmb.reserved, i));
 
 	/* XXX need to clip this if using highmem? */
-	for (i = 0; i < lmb.memory.cnt; i++)
-		memory_present(0, lmb_start_pfn(&lmb.memory, i),
-			       lmb_end_pfn(&lmb.memory, i));
+	sparse_memory_present_with_active_regions(0);
+
 	init_bootmem_done = 1;
 }
 
@@ -289,8 +290,6 @@ void __init do_init_bootmem(void)
  */
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
 	unsigned long total_ram = lmb_phys_mem_size();
 	unsigned long top_of_ram = lmb_end_of_DRAM();
 
@@ -308,26 +307,18 @@ void __init paging_init(void)
 	       top_of_ram, total_ram);
 	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
-	/*
-	 * All pages are DMA-able so we put them all in the DMA zone.
-	 */
-	memset(zones_size, 0, sizeof(zones_size));
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
-	zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-
 #ifdef CONFIG_HIGHMEM
-	zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
-	zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
-	zholes_size[ZONE_HIGHMEM] = (top_of_ram - total_ram) >> PAGE_SHIFT;
+	free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT);
 #else
-	zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
-	zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-#endif /* CONFIG_HIGHMEM */
+	free_area_init_nodes(top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT);
+#endif
 
-	free_area_init_node(0, NODE_DATA(0), zones_size,
-			    __pa(PAGE_OFFSET) >> PAGE_SHIFT, zholes_size);
 }
 #endif /* ! CONFIG_NEED_MULTIPLE_NODES */
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c
--- linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c	2006-06-05 14:15:06.000000000 +0100
@@ -39,96 +39,6 @@ static bootmem_data_t __initdata plat_no
 static int min_common_depth;
 static int n_mem_addr_cells, n_mem_size_cells;
 
-/*
- * We need somewhere to store start/end/node for each region until we have
- * allocated the real node_data structures.
- */
-#define MAX_REGIONS	(MAX_LMB_REGIONS*2)
-static struct {
-	unsigned long start_pfn;
-	unsigned long end_pfn;
-	int nid;
-} init_node_data[MAX_REGIONS] __initdata;
-
-int __init early_pfn_to_nid(unsigned long pfn)
-{
-	unsigned int i;
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		unsigned long start_pfn = init_node_data[i].start_pfn;
-		unsigned long end_pfn = init_node_data[i].end_pfn;
-
-		if ((start_pfn <= pfn) && (pfn < end_pfn))
-			return init_node_data[i].nid;
-	}
-
-	return -1;
-}
-
-void __init add_region(unsigned int nid, unsigned long start_pfn,
-		       unsigned long pages)
-{
-	unsigned int i;
-
-	dbg("add_region nid %d start_pfn 0x%lx pages 0x%lx\n",
-		nid, start_pfn, pages);
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		if (init_node_data[i].nid != nid)
-			continue;
-		if (init_node_data[i].end_pfn == start_pfn) {
-			init_node_data[i].end_pfn += pages;
-			return;
-		}
-		if (init_node_data[i].start_pfn == (start_pfn + pages)) {
-			init_node_data[i].start_pfn -= pages;
-			return;
-		}
-	}
-
-	/*
-	 * Leave last entry NULL so we dont iterate off the end (we use
-	 * entry.end_pfn to terminate the walk).
-	 */
-	if (i >= (MAX_REGIONS - 1)) {
-		printk(KERN_ERR "WARNING: too many memory regions in "
-				"numa code, truncating\n");
-		return;
-	}
-
-	init_node_data[i].start_pfn = start_pfn;
-	init_node_data[i].end_pfn = start_pfn + pages;
-	init_node_data[i].nid = nid;
-}
-
-/* We assume init_node_data has no overlapping regions */
-void __init get_region(unsigned int nid, unsigned long *start_pfn,
-		       unsigned long *end_pfn, unsigned long *pages_present)
-{
-	unsigned int i;
-
-	*start_pfn = -1UL;
-	*end_pfn = *pages_present = 0;
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		if (init_node_data[i].nid != nid)
-			continue;
-
-		*pages_present += init_node_data[i].end_pfn -
-			init_node_data[i].start_pfn;
-
-		if (init_node_data[i].start_pfn < *start_pfn)
-			*start_pfn = init_node_data[i].start_pfn;
-
-		if (init_node_data[i].end_pfn > *end_pfn)
-			*end_pfn = init_node_data[i].end_pfn;
-	}
-
-	/* We didnt find a matching region, return start/end as 0 */
-	if (*start_pfn == -1UL)
-		*start_pfn = 0;
-}
-
 static void __cpuinit map_cpu_to_node(int cpu, int node)
 {
 	numa_cpu_lookup_table[cpu] = node;
@@ -471,8 +381,8 @@ new_range:
 				continue;
 		}
 
-		add_region(nid, start >> PAGE_SHIFT,
-			   size >> PAGE_SHIFT);
+		add_active_range(nid, start >> PAGE_SHIFT,
+				(start >> PAGE_SHIFT) + (size >> PAGE_SHIFT));
 
 		if (--ranges)
 			goto new_range;
@@ -485,6 +395,7 @@ static void __init setup_nonnuma(void)
 {
 	unsigned long top_of_ram = lmb_end_of_DRAM();
 	unsigned long total_ram = lmb_phys_mem_size();
+	unsigned long start_pfn, end_pfn;
 	unsigned int i;
 
 	printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
@@ -492,9 +403,11 @@ static void __init setup_nonnuma(void)
 	printk(KERN_DEBUG "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
 
-	for (i = 0; i < lmb.memory.cnt; ++i)
-		add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT,
-			   lmb_size_pages(&lmb.memory, i));
+	for (i = 0; i < lmb.memory.cnt; ++i) {
+		start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+		end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+		add_active_range(0, start_pfn, end_pfn);
+	}
 	node_set_online(0);
 }
 
@@ -632,11 +545,11 @@ void __init do_init_bootmem(void)
 			  (void *)(unsigned long)boot_cpuid);
 
 	for_each_online_node(nid) {
-		unsigned long start_pfn, end_pfn, pages_present;
+		unsigned long start_pfn, end_pfn;
 		unsigned long bootmem_paddr;
 		unsigned long bootmap_pages;
 
-		get_region(nid, &start_pfn, &end_pfn, &pages_present);
+		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
 
 		/* Allocate the node structure node local if possible */
 		NODE_DATA(nid) = careful_allocation(nid,
@@ -669,19 +582,7 @@ void __init do_init_bootmem(void)
 		init_bootmem_node(NODE_DATA(nid), bootmem_paddr >> PAGE_SHIFT,
 				  start_pfn, end_pfn);
 
-		/* Add free regions on this node */
-		for (i = 0; init_node_data[i].end_pfn; i++) {
-			unsigned long start, end;
-
-			if (init_node_data[i].nid != nid)
-				continue;
-
-			start = init_node_data[i].start_pfn << PAGE_SHIFT;
-			end = init_node_data[i].end_pfn << PAGE_SHIFT;
-
-			dbg("free_bootmem %lx %lx\n", start, end - start);
-  			free_bootmem_node(NODE_DATA(nid), start, end - start);
-		}
+		free_bootmem_with_active_regions(nid, end_pfn);
 
 		/* Mark reserved regions on this node */
 		for (i = 0; i < lmb.reserved.cnt; i++) {
@@ -712,44 +613,14 @@ void __init do_init_bootmem(void)
 			}
 		}
 
-		/* Add regions into sparsemem */
-		for (i = 0; init_node_data[i].end_pfn; i++) {
-			unsigned long start, end;
-
-			if (init_node_data[i].nid != nid)
-				continue;
-
-			start = init_node_data[i].start_pfn;
-			end = init_node_data[i].end_pfn;
-
-			memory_present(nid, start, end);
-		}
+		sparse_memory_present_with_active_regions(nid);
 	}
 }
 
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
-	int nid;
-
-	memset(zones_size, 0, sizeof(zones_size));
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	for_each_online_node(nid) {
-		unsigned long start_pfn, end_pfn, pages_present;
-
-		get_region(nid, &start_pfn, &end_pfn, &pages_present);
-
-		zones_size[ZONE_DMA] = end_pfn - start_pfn;
-		zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - pages_present;
-
-		dbg("free_area_init node %d %lx %lx (hole: %lx)\n", nid,
-		    zones_size[ZONE_DMA], start_pfn, zholes_size[ZONE_DMA]);
-
-		free_area_init_node(nid, NODE_DATA(nid), zones_size, start_pfn,
-				    zholes_size);
-	}
+	unsigned long end_pfn = lmb_end_of_DRAM() >> PAGE_SHIFT;
+	free_area_init_nodes(end_pfn, end_pfn, end_pfn, end_pfn);
 }
 
 static int __init early_numa(char *p)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/ppc/Kconfig linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/ppc/Kconfig
--- linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/ppc/Kconfig	2006-06-05 14:12:48.000000000 +0100
+++ linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/ppc/Kconfig	2006-06-05 14:15:06.000000000 +0100
@@ -953,6 +953,9 @@ config NR_CPUS
 config HIGHMEM
 	bool "High memory support"
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 source kernel/Kconfig.hz
 source kernel/Kconfig.preempt
 source "mm/Kconfig"
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/ppc/mm/init.c linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/ppc/mm/init.c
--- linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/arch/ppc/mm/init.c	2006-05-25 02:50:17.000000000 +0100
+++ linux-2.6.17-rc5-mm3-102-powerpc_use_init_nodes/arch/ppc/mm/init.c	2006-06-05 14:15:06.000000000 +0100
@@ -359,8 +359,7 @@ void __init do_init_bootmem(void)
  */
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES], i;
-
+	unsigned long start_pfn, end_pfn;
 #ifdef CONFIG_HIGHMEM
 	map_page(PKMAP_BASE, 0, 0);	/* XXX gross */
 	pkmap_page_table = pte_offset_kernel(pmd_offset(pgd_offset_k
@@ -370,19 +369,22 @@ void __init paging_init(void)
 			(KMAP_FIX_BEGIN), KMAP_FIX_BEGIN), KMAP_FIX_BEGIN);
 	kmap_prot = PAGE_KERNEL;
 #endif /* CONFIG_HIGHMEM */
-
-	/*
-	 * All pages are DMA-able so we put them all in the DMA zone.
-	 */
-	zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
-	for (i = 1; i < MAX_NR_ZONES; i++)
-		zones_size[i] = 0;
+	/* All pages are DMA-able so we put them all in the DMA zone. */
+	start_pfn = __pa(PAGE_OFFSET) >> PAGE_SHIFT;
+	end_pfn = start_pfn + (total_memory >> PAGE_SHIFT);
+	add_active_range(0, start_pfn, end_pfn);
 
 #ifdef CONFIG_HIGHMEM
-	zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
+	free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT);
+#else
+	free_area_init_nodes(total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT);
 #endif /* CONFIG_HIGHMEM */
-
-	free_area_init(zones_size);
 }
 
 void __init mem_init(void)

^ permalink raw reply

* [PATCH 1/5] Introduce mechanism for registering active regions of memory
From: Mel Gorman @ 2006-06-06 13:47 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linuxppc-dev, Mel Gorman, linux-kernel,
	bob.picco, ak, linux-mm
In-Reply-To: <20060606134710.21419.48239.sendpatchset@skynet.skynet.ie>


This patch defines the structure to represent an active range of page
frames within a node in an architecture independent manner. Architectures
are expected to register active ranges of PFNs using add_active_range(nid,
start_pfn, end_pfn) and call free_area_init_nodes() passing the PFNs of
the end of each zone.


 include/linux/mm.h     |   45 +++
 include/linux/mmzone.h |   10 
 mm/page_alloc.c        |  550 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 580 insertions(+), 25 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-clean/include/linux/mm.h linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/include/linux/mm.h
--- linux-2.6.17-rc5-mm3-clean/include/linux/mm.h	2006-06-05 14:12:51.000000000 +0100
+++ linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/include/linux/mm.h	2006-06-05 14:14:15.000000000 +0100
@@ -924,6 +924,51 @@ extern void free_area_init(unsigned long
 extern void free_area_init_node(int nid, pg_data_t *pgdat,
 	unsigned long * zones_size, unsigned long zone_start_pfn, 
 	unsigned long *zholes_size);
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/*
+ * With CONFIG_ARCH_POPULATES_NODE_MAP set, an architecture may initialise its
+ * zones, allocate the backing mem_map and account for memory holes in a more
+ * architecture independent manner. This is a substitute for creating the
+ * zone_sizes[] and zholes_size[] arrays and passing them to
+ * free_area_init_node()
+ *
+ * An architecture is expected to register range of page frames backed by
+ * physical memory with add_active_range() before calling
+ * free_area_init_nodes() passing in the PFN each zone ends at. At a basic
+ * usage, an architecture is expected to do something like
+ *
+ * for_each_valid_physical_page_range()
+ * 	add_active_range(node_id, start_pfn, end_pfn)
+ * free_area_init_nodes(max_dma, max_dma32, max_normal_pfn, max_highmem_pfn);
+ *
+ * If the architecture guarantees that there are no holes in the ranges
+ * registered with add_active_range(), free_bootmem_active_regions()
+ * will call free_bootmem_node() for each registered physical page range.
+ * Similarly sparse_memory_present_with_active_regions() calls
+ * memory_present() for each range when SPARSEMEM is enabled.
+ *
+ * See mm/page_alloc.c for more information on each function exposed by
+ * CONFIG_ARCH_POPULATES_NODE_MAP
+ */
+extern void free_area_init_nodes(unsigned long max_dma_pfn,
+					unsigned long max_dma32_pfn,
+					unsigned long max_low_pfn,
+					unsigned long max_high_pfn);
+extern void add_active_range(unsigned int nid, unsigned long start_pfn,
+					unsigned long end_pfn);
+extern void shrink_active_range(unsigned int nid, unsigned long old_end_pfn,
+						unsigned long new_end_pfn);
+extern void remove_all_active_ranges(void);
+extern unsigned long absent_pages_in_range(unsigned long start_pfn,
+						unsigned long end_pfn);
+extern void get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn);
+extern unsigned long find_min_pfn_with_active_regions(void);
+extern unsigned long find_max_pfn_with_active_regions(void);
+extern void free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn);
+extern void sparse_memory_present_with_active_regions(int nid);
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-clean/include/linux/mmzone.h linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/include/linux/mmzone.h
--- linux-2.6.17-rc5-mm3-clean/include/linux/mmzone.h	2006-06-05 14:12:51.000000000 +0100
+++ linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/include/linux/mmzone.h	2006-06-05 14:14:15.000000000 +0100
@@ -277,6 +277,13 @@ struct zonelist {
 	struct zone *zones[MAX_NUMNODES * MAX_NR_ZONES + 1]; // NULL delimited
 };
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+struct node_active_region {
+	unsigned long start_pfn;
+	unsigned long end_pfn;
+	int nid;
+};
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
 /*
  * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
@@ -484,7 +491,8 @@ extern struct zone *next_zone(struct zon
 
 #endif
 
-#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
+	!defined(CONFIG_ARCH_POPULATES_NODE_MAP)
 #define early_pfn_to_nid(nid)  (0UL)
 #endif
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.17-rc5-mm3-clean/mm/page_alloc.c linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/mm/page_alloc.c
--- linux-2.6.17-rc5-mm3-clean/mm/page_alloc.c	2006-06-05 14:12:51.000000000 +0100
+++ linux-2.6.17-rc5-mm3-101-add_free_area_init_nodes/mm/page_alloc.c	2006-06-05 14:14:15.000000000 +0100
@@ -38,6 +38,8 @@
 #include <linux/vmalloc.h>
 #include <linux/mempolicy.h>
 #include <linux/stop_machine.h>
+#include <linux/sort.h>
+#include <linux/pfn.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -87,6 +89,33 @@ int min_free_kbytes = 1024;
 unsigned long __meminitdata nr_kernel_pages;
 unsigned long __meminitdata nr_all_pages;
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+  /*
+   * MAX_ACTIVE_REGIONS determines the maxmimum number of distinct
+   * ranges of memory (RAM) that may be registered with add_active_range().
+   * Ranges passed to add_active_range() will be merged if possible
+   * so the number of times add_active_range() can be called is
+   * related to the number of nodes and the number of holes
+   */
+  #ifdef CONFIG_MAX_ACTIVE_REGIONS
+    /* Allow an architecture to set MAX_ACTIVE_REGIONS to save memory */
+    #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS
+  #else
+    #if MAX_NUMNODES >= 32
+      /* If there can be many nodes, allow up to 50 holes per node */
+      #define MAX_ACTIVE_REGIONS (MAX_NUMNODES*50)
+    #else
+      /* By default, allow up to 256 distinct regions */
+      #define MAX_ACTIVE_REGIONS 256
+    #endif
+  #endif
+
+  struct node_active_region __initdata early_node_map[MAX_ACTIVE_REGIONS];
+  int __initdata nr_nodemap_entries;
+  unsigned long __initdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
+  unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1887,25 +1916,6 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
-static void __init calculate_zone_totalpages(struct pglist_data *pgdat,
-		unsigned long *zones_size, unsigned long *zholes_size)
-{
-	unsigned long realtotalpages, totalpages = 0;
-	int i;
-
-	for (i = 0; i < MAX_NR_ZONES; i++)
-		totalpages += zones_size[i];
-	pgdat->node_spanned_pages = totalpages;
-
-	realtotalpages = totalpages;
-	if (zholes_size)
-		for (i = 0; i < MAX_NR_ZONES; i++)
-			realtotalpages -= zholes_size[i];
-	pgdat->node_present_pages = realtotalpages;
-	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
-}
-
-
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2223,6 +2233,272 @@ __meminit int init_currently_empty_zone(
 	return 0;
 }
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/*
+ * Basic iterator support. Return the first range of PFNs for a node
+ * Note: nid == MAX_NUMNODES returns first region regardless of node
+ */
+static int __init first_active_region_index_in_nid(int nid)
+{
+	int i;
+
+	for (i = 0; i < nr_nodemap_entries; i++)
+		if (nid == MAX_NUMNODES || early_node_map[i].nid == nid)
+			return i;
+
+	return -1;
+}
+
+/*
+ * Basic iterator support. Return the next active range of PFNs for a node
+ * Note: nid == MAX_NUMNODES returns next region regardles of node
+ */
+static int __init next_active_region_index_in_nid(int index, int nid)
+{
+	for (index = index + 1; index < nr_nodemap_entries; index++)
+		if (nid == MAX_NUMNODES || early_node_map[index].nid == nid)
+			return index;
+
+	return -1;
+}
+
+#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+/*
+ * Required by SPARSEMEM. Given a PFN, return what node the PFN is on.
+ * Architectures may implement their own version but if add_active_range()
+ * was used and there are no special requirements, this is a convenient
+ * alternative
+ */
+int __init early_pfn_to_nid(unsigned long pfn)
+{
+	int i;
+
+	for (i = 0; i < nr_nodemap_entries; i++) {
+		unsigned long start_pfn = early_node_map[i].start_pfn;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+
+		if (start_pfn <= pfn && pfn < end_pfn)
+			return early_node_map[i].nid;
+	}
+
+	return 0;
+}
+#endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
+
+/* Basic iterator support to walk early_node_map[] */
+#define for_each_active_range_index_in_nid(i, nid) \
+	for (i = first_active_region_index_in_nid(nid); i != -1; \
+				i = next_active_region_index_in_nid(i, nid))
+
+/**
+ * free_bootmem_with_active_regions - Call free_bootmem_node for each active range
+ * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed
+ * @max_low_pfn: The highest PFN that till be passed to free_bootmem_node
+ *
+ * If an architecture guarantees that all ranges registered with
+ * add_active_ranges() contain no holes and may be freed, this
+ * this function may be used instead of calling free_bootmem() manually.
+ */
+void __init free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn)
+{
+	int i;
+
+	for_each_active_range_index_in_nid(i, nid) {
+		unsigned long size_pages = 0;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+
+		if (early_node_map[i].start_pfn >= max_low_pfn)
+			continue;
+
+		if (end_pfn > max_low_pfn)
+			end_pfn = max_low_pfn;
+
+		size_pages = end_pfn - early_node_map[i].start_pfn;
+		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
+				PFN_PHYS(early_node_map[i].start_pfn),
+				size_pages << PAGE_SHIFT);
+	}
+}
+
+/**
+ * sparse_memory_present_with_active_regions - Call memory_present for each active range
+ * @nid: The node to call memory_present for. If MAX_NUMNODES, all nodes will be used
+ *
+ * If an architecture guarantees that all ranges registered with
+ * add_active_ranges() contain no holes and may be freed, this
+ * this function may be used instead of calling memory_present() manually.
+ */
+void __init sparse_memory_present_with_active_regions(int nid)
+{
+	int i;
+
+	for_each_active_range_index_in_nid(i, nid)
+		memory_present(early_node_map[i].nid,
+				early_node_map[i].start_pfn,
+				early_node_map[i].end_pfn);
+}
+
+/**
+ * get_pfn_range_for_nid - Return the start and end page frames for a node
+ * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned
+ * @start_pfn: Passed by reference. On return, it will have the node start_pfn
+ * @end_pfn: Passed by reference. On return, it will have the node end_pfn
+ *
+ * It returns the start and end page frame of a node based on information
+ * provided by an arch calling add_active_range(). If called for a node
+ * with no available memory, a warning is printed and the start and end
+ * PFNs will be 0
+ */
+void __init get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn)
+{
+	int i;
+	*start_pfn = -1UL;
+	*end_pfn = 0;
+
+	for_each_active_range_index_in_nid(i, nid) {
+		*start_pfn = min(*start_pfn, early_node_map[i].start_pfn);
+		*end_pfn = max(*end_pfn, early_node_map[i].end_pfn);
+	}
+
+	if (*start_pfn == -1UL) {
+		printk(KERN_WARNING "Node %u active with no memory\n", nid);
+		*start_pfn = 0;
+	}
+}
+
+/*
+ * Return the number of pages a zone spans in a node, including holes
+ * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node()
+ */
+unsigned long __init zone_spanned_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	unsigned long node_start_pfn, node_end_pfn;
+	unsigned long zone_start_pfn, zone_end_pfn;
+
+	/* Get the start and end of the node and zone */
+	get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
+	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
+	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+
+	/* Check that this node has pages within the zone's required range */
+	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
+		return 0;
+
+	/* Move the zone boundaries inside the node if necessary */
+	zone_end_pfn = min(zone_end_pfn, node_end_pfn);
+	zone_start_pfn = max(zone_start_pfn, node_start_pfn);
+
+	/* Return the spanned pages */
+	return zone_end_pfn - zone_start_pfn;
+}
+
+/*
+ * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
+ * then all holes in the requested range will be accounted for
+ */
+unsigned long __init __absent_pages_in_range(int nid,
+				unsigned long range_start_pfn,
+				unsigned long range_end_pfn)
+{
+	int i = 0;
+	unsigned long prev_end_pfn = 0, hole_pages = 0;
+	unsigned long start_pfn;
+
+	/* Find the end_pfn of the first active range of pfns in the node */
+	i = first_active_region_index_in_nid(nid);
+	if (i == -1)
+		return 0;
+
+	prev_end_pfn = early_node_map[i].start_pfn;
+
+	/* Find all holes for the zone within the node */
+	for (; i != -1; i = next_active_region_index_in_nid(i, nid)) {
+
+		/* No need to continue if prev_end_pfn is outside the zone */
+		if (prev_end_pfn >= range_end_pfn)
+			break;
+
+		/* Make sure the end of the zone is not within the hole */
+		start_pfn = min(early_node_map[i].start_pfn, range_end_pfn);
+		prev_end_pfn = max(prev_end_pfn, range_start_pfn);
+
+		/* Update the hole size cound and move on */
+		if (start_pfn > range_start_pfn) {
+			BUG_ON(prev_end_pfn > start_pfn);
+			hole_pages += start_pfn - prev_end_pfn;
+		}
+		prev_end_pfn = early_node_map[i].end_pfn;
+	}
+
+	return hole_pages;
+}
+
+/**
+ * absent_pages_in_range - Return number of page frames in holes within a range
+ * @start_pfn: The start PFN to start searching for holes
+ * @end_pfn: The end PFN to stop searching for holes
+ *
+ * It returns the number of pages frames in memory holes within a range
+ */
+unsigned long __init absent_pages_in_range(unsigned long start_pfn,
+							unsigned long end_pfn)
+{
+	return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn);
+}
+
+/* Return the number of page frames in holes in a zone on a node */
+unsigned long __init zone_absent_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	return __absent_pages_in_range(nid,
+				arch_zone_lowest_possible_pfn[zone_type],
+				arch_zone_highest_possible_pfn[zone_type]);
+}
+#else
+static inline unsigned long zone_spanned_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *zones_size)
+{
+	return zones_size[zone_type];
+}
+
+static inline unsigned long zone_absent_pages_in_node(int nid,
+						unsigned long zone_type,
+						unsigned long *zholes_size)
+{
+	if (!zholes_size)
+		return 0;
+
+	return zholes_size[zone_type];
+}
+#endif
+
+static void __init calculate_node_totalpages(struct pglist_data *pgdat,
+		unsigned long *zones_size, unsigned long *zholes_size)
+{
+	unsigned long realtotalpages, totalpages = 0;
+	int i;
+
+	for (i = 0; i < MAX_NR_ZONES; i++)
+		totalpages += zone_spanned_pages_in_node(pgdat->node_id, i,
+								zones_size);
+	pgdat->node_spanned_pages = totalpages;
+
+	realtotalpages = totalpages;
+	for (i = 0; i < MAX_NR_ZONES; i++)
+		realtotalpages -=
+			zone_absent_pages_in_node(pgdat->node_id, i,
+								zholes_size);
+	pgdat->node_present_pages = realtotalpages;
+	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id,
+							realtotalpages);
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -2246,10 +2522,9 @@ static void __meminit free_area_init_cor
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize;
 
-		realsize = size = zones_size[j];
-		if (zholes_size)
-			realsize -= zholes_size[j];
-
+		size = zone_spanned_pages_in_node(nid, j, zones_size);
+		realsize = size - zone_absent_pages_in_node(nid, j,
+								zholes_size);
 		if (j < ZONE_HIGHMEM)
 			nr_kernel_pages += realsize;
 		nr_all_pages += realsize;
@@ -2340,13 +2615,240 @@ void __meminit free_area_init_node(int n
 {
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
-	calculate_zone_totalpages(pgdat, zones_size, zholes_size);
+	calculate_node_totalpages(pgdat, zones_size, zholes_size);
 
 	alloc_node_mem_map(pgdat);
 
 	free_area_init_core(pgdat, zones_size, zholes_size);
 }
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/**
+ * add_active_range - Register a range of PFNs backed by physical memory
+ * @nid: The node ID the range resides on
+ * @start_pfn: The start PFN of the available physical memory
+ * @end_pfn: The end PFN of the available physical memory
+ *
+ * These ranges are stored in an early_node_map[] and later used by
+ * free_area_init_nodes() to calculate zone sizes and holes. If the
+ * range spans a memory hole, it is up to the architecture to ensure
+ * the memory is not freed by the bootmem allocator. If possible
+ * the range being registered will be merged with existing ranges.
+ */
+void __init add_active_range(unsigned int nid, unsigned long start_pfn,
+						unsigned long end_pfn)
+{
+	int i;
+
+	printk(KERN_DEBUG "Entering add_active_range(%d, %lu, %lu) "
+			  "%d entries of %d used\n",
+			  nid, start_pfn, end_pfn,
+			  nr_nodemap_entries, MAX_ACTIVE_REGIONS);
+
+	/* Merge with existing active regions if possible */
+	for (i = 0; i < nr_nodemap_entries; i++) {
+		if (early_node_map[i].nid != nid)
+			continue;
+
+		/* Skip if an existing region covers this new one */
+		if (start_pfn >= early_node_map[i].start_pfn &&
+				end_pfn <= early_node_map[i].end_pfn)
+			return;
+
+		/* Merge forward if suitable */
+		if (start_pfn <= early_node_map[i].end_pfn &&
+				end_pfn > early_node_map[i].end_pfn) {
+			early_node_map[i].end_pfn = end_pfn;
+			return;
+		}
+
+		/* Merge backward if suitable */
+		if (start_pfn < early_node_map[i].end_pfn &&
+				end_pfn >= early_node_map[i].start_pfn) {
+			early_node_map[i].start_pfn = start_pfn;
+			return;
+		}
+	}
+
+	/* Check that early_node_map is large enough */
+	if (i >= MAX_ACTIVE_REGIONS) {
+		printk(KERN_CRIT "More than %d memory regions, truncating\n",
+							MAX_ACTIVE_REGIONS);
+		return;
+	}
+
+	early_node_map[i].nid = nid;
+	early_node_map[i].start_pfn = start_pfn;
+	early_node_map[i].end_pfn = end_pfn;
+	nr_nodemap_entries = i + 1;
+}
+
+/**
+ * shrink_active_range - Shrink an existing registered range of PFNs
+ * @nid: The node id the range is on that should be shrunk
+ * @old_end_pfn: The old end PFN of the range
+ * @new_end_pfn: The new PFN of the range
+ *
+ * i386 with NUMA use alloc_remap() to store a node_mem_map on a local node.
+ * The map is kept at the end physical page range that has already been
+ * registered with add_active_range(). This function allows an arch to shrink
+ * an existing registered range.
+ */
+void __init shrink_active_range(unsigned int nid, unsigned long old_end_pfn,
+						unsigned long new_end_pfn)
+{
+	int i;
+
+	/* Find the old active region end and shrink */
+	for_each_active_range_index_in_nid(i, nid)
+		if (early_node_map[i].end_pfn == old_end_pfn) {
+			early_node_map[i].end_pfn = new_end_pfn;
+			break;
+		}
+}
+
+/**
+ * remove_all_active_ranges - Remove all currently registered regions
+ * During discovery, it may be found that a table like SRAT is invalid
+ * and an alternative discovery method must be used. This function removes
+ * all currently registered regions.
+ */
+void __init remove_all_active_ranges()
+{
+	memset(early_node_map, 0, sizeof(early_node_map));
+	nr_nodemap_entries = 0;
+}
+
+/* Compare two active node_active_regions */
+static int __init cmp_node_active_region(const void *a, const void *b)
+{
+	struct node_active_region *arange = (struct node_active_region *)a;
+	struct node_active_region *brange = (struct node_active_region *)b;
+
+	/* Done this way to avoid overflows */
+	if (arange->start_pfn > brange->start_pfn)
+		return 1;
+	if (arange->start_pfn < brange->start_pfn)
+		return -1;
+
+	return 0;
+}
+
+/* sort the node_map by start_pfn */
+static void __init sort_node_map(void)
+{
+	sort(early_node_map, (size_t)nr_nodemap_entries,
+			sizeof(struct node_active_region),
+			cmp_node_active_region, NULL);
+}
+
+/* Find the lowest pfn for a node. This depends on a sorted early_node_map */
+unsigned long __init find_min_pfn_for_node(unsigned long nid)
+{
+	int i;
+
+	/* Assuming a sorted map, the first range found has the starting pfn */
+	for_each_active_range_index_in_nid(i, nid)
+		return early_node_map[i].start_pfn;
+
+	printk(KERN_WARNING "Could not find start_pfn for node %lu\n", nid);
+	return 0;
+}
+
+/**
+ * find_min_pfn_with_active_regions - Find the minimum PFN registered
+ *
+ * It returns the minimum PFN based on information provided via
+ * add_active_range()
+ */
+unsigned long __init find_min_pfn_with_active_regions(void)
+{
+	return find_min_pfn_for_node(MAX_NUMNODES);
+}
+
+/**
+ * find_max_pfn_with_active_regions - Find the maximum PFN registered
+ *
+ * It returns the maximum PFN based on information provided via
+ * add_active_range()
+ */
+unsigned long __init find_max_pfn_with_active_regions(void)
+{
+	int i;
+	unsigned long max_pfn = 0;
+
+	for (i = 0; i < nr_nodemap_entries; i++)
+		max_pfn = max(max_pfn, early_node_map[i].end_pfn);
+
+	return max_pfn;
+}
+
+/**
+ * free_area_init_nodes - Initialise all pg_data_t and zone data
+ * @arch_max_dma_pfn: The maximum PFN usable for ZONE_DMA
+ * @arch_max_dma32_pfn: The maximum PFN usable for ZONE_DMA32
+ * @arch_max_low_pfn: The maximum PFN usable for ZONE_NORMAL
+ * @arch_max_high_pfn: The maximum PFN usable for ZONE_HIGHMEM
+ *
+ * This will call free_area_init_node() for each active node in the system.
+ * Using the page ranges provided by add_active_range(), the size of each
+ * zone in each node and their holes is calculated. If the maximum PFN
+ * between two adjacent zones match, it is assumed that the zone is empty.
+ * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed
+ * that arch_max_dma32_pfn has no pages. It is also assumed that a zone
+ * starts where the previous one ended. For example, ZONE_DMA32 starts
+ * at arch_max_dma_pfn.
+ */
+void __init free_area_init_nodes(unsigned long arch_max_dma_pfn,
+				unsigned long arch_max_dma32_pfn,
+				unsigned long arch_max_low_pfn,
+				unsigned long arch_max_high_pfn)
+{
+	unsigned long nid;
+	int i;
+
+	/* Record where the zone boundaries are */
+	memset(arch_zone_lowest_possible_pfn, 0,
+				sizeof(arch_zone_lowest_possible_pfn));
+	memset(arch_zone_highest_possible_pfn, 0,
+				sizeof(arch_zone_highest_possible_pfn));
+	arch_zone_lowest_possible_pfn[ZONE_DMA] =
+					find_min_pfn_with_active_regions();
+	arch_zone_highest_possible_pfn[ZONE_DMA] = arch_max_dma_pfn;
+	arch_zone_highest_possible_pfn[ZONE_DMA32] = arch_max_dma32_pfn;
+	arch_zone_highest_possible_pfn[ZONE_NORMAL] = arch_max_low_pfn;
+	arch_zone_highest_possible_pfn[ZONE_HIGHMEM] = arch_max_high_pfn;
+	for (i = 1; i < MAX_NR_ZONES; i++)
+		arch_zone_lowest_possible_pfn[i] =
+			arch_zone_highest_possible_pfn[i-1];
+
+	/* Regions in the early_node_map can be in any order */
+	sort_node_map();
+
+	/* Print out the zone ranges */
+	printk("Zone PFN ranges:\n");
+	for (i = 0; i < MAX_NR_ZONES; i++)
+		printk("  %-8s %8lu -> %8lu\n",
+				zone_names[i],
+				arch_zone_lowest_possible_pfn[i],
+				arch_zone_highest_possible_pfn[i]);
+
+	/* Print out the early_node_map[] */
+	printk("early_node_map[%d] active PFN ranges\n", nr_nodemap_entries);
+	for (i = 0; i < nr_nodemap_entries; i++)
+		printk("  %3d: %8lu -> %8lu\n", early_node_map[i].nid,
+						early_node_map[i].start_pfn,
+						early_node_map[i].end_pfn);
+
+	/* Initialise every node */
+	for_each_online_node(nid) {
+		pg_data_t *pgdat = NODE_DATA(nid);
+		free_area_init_node(nid, pgdat, NULL,
+				find_min_pfn_for_node(nid), NULL);
+	}
+}
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 static bootmem_data_t contig_bootmem_data;
 struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };

^ permalink raw reply

* [PATCH 0/5] Sizing zones and holes in an architecture independent manner V7
From: Mel Gorman @ 2006-06-06 13:47 UTC (permalink / raw)
  To: akpm
  Cc: davej, tony.luck, linux-mm, Mel Gorman, ak, bob.picco,
	linux-kernel, linuxppc-dev

This is V7 of the patchset to size zones and memory holes in an
architecture-independent manner.

The notable change in this release is a fix of the ACPI bug on x86_64 from
V6. Christian Kujau has tested the various from rc4-mm1 to rc4-mm3 and found
that his ACPI problem got fixed in the later -mms for unknown reasons. He
tested rc4-mm3 with and without these patches and reported no problems. Bob
Picco also tested these patches and found that V6 had an ACPI problem with
mem=3GB (just below the ACPI data) but booted fine with this release. With
these tests, I think the patches are ready for another spin in -mm.

Changelog since V6
o MAX_ACTIVE_REGIONS is really maximum active regions, not MAX_ACTIVE_REGIONS-1
o MAX_ACTIVE_REGIONS is 256 unless the architecture specifically asks for
  a different number or MAX_NUMNODES is >= 32
o nr_nodemap_entries tracks the number of entries rather than terminating with
  end_pfn == 0
o Add number of documentation-related comments. Functions exposed by headers
  may potentially be picked up by kerneldoc
o Changed misleading zone_present_pages_in_node() name to
  zone_spanned_pages_in_node()
o Be a bit more verbose to help debugging when things go wrong.
o On x86_64, end_pfn_map now gets updated properly or ACPI tables get "lost"
o Signoffs added to patches 1 and 5 by Bob Picco related to contributions,
  fixes and reviews

Changelog since V5
o Add a missing #include to mm/mem_init.c
o Drop the verbose debugging part of the set
o Report active range registration when loglevel is set for KERN_DEBUG

Changelog since V4
o Rebase to 2.6.17-rc3-mm1
o Calculate holes on x86 with SRAT correctly

Changelog since V3
o Rebase to 2.6.17-rc2
o Allow the active regions to be cleared. Needed by x86_64 when it decides
  the SRAT table is bad half way through the registering of active regions
o Fix for flatmem x86_64 machines booting

Changelog since V2
o Fix a bug where holes in lower zones get double counted
o Catch the case where a new range is registered that is within an range
o Catch the case where a zone boundary is within a hole
o Use the EFI map for registering ranges on x86_64+numa
o On IA64+NUMA, add the active ranges before rounding for granules
o On x86_64, remove e820_hole_size and e820_bootmem_free and use
  arch-independent equivalents
o On x86_64, remove the map walk in e820_end_of_ram()
o Rename memory_present_with_active_regions, name ambiguous
o Add absent_pages_in_range() for arches to call

Changelog since V1
o Correctly convert virtual and physical addresses to PFNs on ia64
o Correctly convert physical addresses to PFN on older ppc 
o When add_active_range() is called with overlapping pfn ranges, merge them
o When a zone boundary occurs within a memory hole, account correctly
o Minor whitespace damage cleanup
o Debugging patch temporarily included

At a basic level, architectures define structures to record where active
ranges of page frames are located. Once located, the code to calculate
zone sizes and holes in each architecture is very similar.  Some of this
zone and hole sizing code is difficult to read for no good reason. This
set of patches eliminates the similar-looking architecture-specific code.

The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range(). When all areas
have been discovered, free_area_init_nodes() is called to initialise
the pgdat and zones. The zone sizes and holes are then calculated in an
architecture independent manner.

Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 134 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 142 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 78 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 57 arch-specific LOC removed

The patches have been successfully boot tested by me and verified that the
zones are the correct size on

o x86, flatmem with 1.5GiB of RAM
o x86, NUMAQ
o x86 with SRAT CONFIG_NUMA=n
o PPC64, NUMA
o PPC64, CONFIG_NUMA=n
o PPC64, CONFIG_64BIT=N
o x86_64, NUMA with SRAT
o x86_64, NUMA with broken SRAT that falls back to k8topology discovery
o x86_64, CONFIG_NUMA=n
o x86_64, CONFIG_64=n
o x86_64, CONFIG_64=n, CONFIG_NUMA=n
o x86_64, ACPI_NUMA, ACPI_MEMORY_HOTPLUG && !SPARSEMEM to trigger the
  hotadd path without sparsemem fun in srat.c (SRAT broken on test machine and
  I'm pretty sure the machine does not support physical memory hotadd anyway
  so test may not have been effective other than being a compile test.)
o ia64 (Itanium 2)
o ia64 (Itanium 2), CONFIG_64=N

Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig. Bob Picco has also tested and debugged
on IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine. These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.

There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory
holes but the architecture-independent code accounts the memory as present.

The big benefit of this set of patches is the reduction of 411 lines of
architecture-specific code, some of which is very hairy. There should be
a greater net reduction when other architectures use the same mechanisms
for zone and hole sizing but I lack the hardware to test on.

Additional credit;
	Dave Hansen for the initial suggestion and comments on early patches
	Andy Whitcroft for reviewing early versions and catching numerous errors
	Tony Luck for testing and debugging on IA64
	Bob Picco for fixing bugs related to pfn registration, reviewing a
		number of patch revisions, providing a number of suggestions on
		future direction and testing heavily
	Jack Steiner and Robin Holt for testing on IA64 and clarifying issues
		related to memory holes
	Yasunori for testing on IA64
	Andi Kleen for reviewing and feeding back about x86_64
	Christian Kujau for providing valuable information related to ACPI
		problems on x86_64 and testing potential fixes

 arch/i386/Kconfig           |    8 
 arch/i386/kernel/setup.c    |   19 
 arch/i386/kernel/srat.c     |  101 ---
 arch/i386/mm/discontig.c    |   65 --
 arch/ia64/Kconfig           |    3 
 arch/ia64/mm/contig.c       |   60 --
 arch/ia64/mm/discontig.c    |   41 -
 arch/ia64/mm/init.c         |   12 
 arch/powerpc/Kconfig        |   13 
 arch/powerpc/mm/mem.c       |   53 --
 arch/powerpc/mm/numa.c      |  157 ------
 arch/ppc/Kconfig            |    3 
 arch/ppc/mm/init.c          |   26 -
 arch/x86_64/Kconfig         |    3 
 arch/x86_64/kernel/e820.c   |  109 +---
 arch/x86_64/kernel/setup.c  |    7 
 arch/x86_64/mm/init.c       |   62 --
 arch/x86_64/mm/k8topology.c |    3 
 arch/x86_64/mm/numa.c       |   18 
 arch/x86_64/mm/srat.c       |   11 
 include/asm-ia64/meminit.h  |    1 
 include/asm-x86_64/e820.h   |    5 
 include/asm-x86_64/proto.h  |    2 
 include/linux/mm.h          |   34 +
 include/linux/mmzone.h      |   10 
 mm/Makefile                 |    2 
 mm/mem_init.c               | 1121 ++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |  750 -----------------------------
-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* RE: Linux kernel thread with Linux 2.6.x
From: Laurent Lagrange @ 2006-06-06 13:39 UTC (permalink / raw)
  To: linuxppc-embedded


I have tried many solutions, the two last solutions (work_queue and
kernel_thread)
seem to have the same poor performances.

Does anyone have another idea ?
Thanks in advance.
Laurent

PS : eth_init_rx_bd function can allocate a new skbuff (YES parameter) or
re-use
the previous skbuff allocated to a rx buffer descriptor(NO parameter).

1) tasklet : runs in interrupt context and
then doesn't support semaphore down function.

2) work_queue : doesn't run in interrupt context
and supports semaphore down function.

static irqreturn_t eth_interrupt (int irq, void * dev_id, struct pt_regs *
regs)
{
    DRV_ETH_INFO    *info       = (DRV_ETH_INFO *)dev_id;
    unsigned short  events;

    /* get pending events */
    events = info->eth_reg->fcc_fcce & info->eth_reg->fcc_fccm;

    /* ack pending events */
    info->eth_reg->fcc_fcce = events;

    /* mask interrupts */
    info->eth_reg->fcc_fccm = ~events;

    /* check events */
    if (BTST(FCC_ENET_RXF, events))
tasklet_schedule(&Info_low->recvTasklet);
    if (BTST(FCC_ENET_TXE|FCC_ENET_TXB, events))
tasklet_schedule(&Info_low->xmitEndTasklet);

    return(IRQ_HANDLED);
}

static void eth_rx_event (DRV_ETH_INFO *info)
{
    ETH_BD_MGT_TABLE    *bd_table   = &info->eth_bd_table;
    ETH_BD_MGT          *bd_mgt     = &bd_table->rx[bd_table->rx_out];
    struct sk_buff      *skb;

    /* check empty status */
    while(BTST(BD_ENET_RX_EMPTY, bd_mgt->bd->cbd_sc) == 0) {
        /* check error and upper layer status */
        if
(BTST((BD_ENET_RX_LG|BD_ENET_RX_SH|BD_ENET_RX_NO|BD_ENET_RX_CR|BD_ENET_RX_OV
|BD_ENET_RX_CL), bd_mgt->bd->cbd_sc))    {
            /* re-enable bd with current buffer */
            eth_init_rx_bd(info, bd_mgt, NO); /* always return without error
*/
        } else {
            /* save current segment */
            skb         = (struct sk_buff *)bd_mgt->segment;
            skb->len    = bd_mgt->bd->cbd_datlen;

            /* get a new buffer for this bd */
            if (eth_init_rx_bd(info, bd_mgt, YES) == BRG_NO_ERROR) {
                /* fill sk_buff parameters */
	          skb->dev      = info->dev;
		    skb->protocol = eth_type_trans(skb, info->dev);

		    /* send frame to upper layers that use semaphores */
		    netif_packet(skb);
		} else {
                /* re-enable bd with current buffer */
                eth_init_rx_bd(info, bd_mgt, NO); /* always return without
error */
            }
        }

        /* set next index */
        bd_table->rx_out = (bd_table->rx_out == (ETH_RX_BD_NUMBER -
1))?0:(bd_table->rx_out + 1);

        /* get next bd mgt pointer */
        bd_mgt = &bd_table->rx[bd_table->rx_out];
    }

    /* enable interruption*/
    info->eth_reg->fcc_fccm |= FCC_ENET_RXF;

    return;
}

3) kernel_thread : doesn't run in interrupt context
and supports semaphore down function. I have tried to
change many task parameters :
    task->policy		= SCHED_RR;
    task->static_prio	= 100;
    task->prio		= 100;
    task->rt_priority	= 1;

To synchronize the rx interrruption and the thread I have tried completion
and semaphore mecanism without any difference

static irqreturn_t eth_interrupt (int irq, void * dev_id, struct pt_regs *
regs)
{
    DRV_ETH_INFO    *info       = (DRV_ETH_INFO *)dev_id;
    USHORT          events;

    /* get pending events */
    events = info->eth_reg->fcc_fcce & info->eth_reg->fcc_fccm;

    /* ack pending events */
    info->eth_reg->fcc_fcce = events;

    /* mask interrupts */
    info->eth_reg->fcc_fccm = ~events;

    /* check events */
    if (BTST(FCC_ENET_RXF, events)) osSemSignal(Info_low->recvSem);
    if (BTST(FCC_ENET_TXE|FCC_ENET_TXB, events))
osSemSignal(Info_low->xmitEndSem);

    return(IRQ_HANDLED);
}

unsigned eth_recv_task (void* data)
{
    DRV_ETH_INFO        *info       = (DRV_ETH_INFO
*)Info_low->drv_eth_info;
    ETH_BD_MGT_TABLE    *bd_table   = &info->eth_bd_table;
    ETH_BD_MGT          *bd_mgt     = &bd_table->rx[bd_table->rx_out];

    while(1) {
        /* wait interrupt */
        osSemWait(Info_low->recvSem, 0);

        /* check empty status */
        while(BTST(BD_ENET_RX_EMPTY, bd_mgt->bd->cbd_sc) == 0) {
            /* check error and upper layer status */
            if
(BTST((BD_ENET_RX_LG|BD_ENET_RX_SH|BD_ENET_RX_NO|BD_ENET_RX_CR|BD_ENET_RX_OV
|BD_ENET_RX_CL), bd_mgt->bd->cbd_sc))    {
                /* re-enable bd with current buffer */
                eth_init_rx_bd(info, bd_mgt, NO); /* always return without
error */
            } else {
                /* save current segment */
                skb         = (struct sk_buff *)bd_mgt->segment;
                skb->len    = bd_mgt->bd->cbd_datlen - 4;

            /* get a new buffer for this bd */
            if (eth_init_rx_bd(info, bd_mgt, YES) == BRG_NO_ERROR) {
                /* fill sk_buff parameters */
	          skb->dev      = info->dev;
		    skb->protocol = eth_type_trans(skb, info->dev);

		    /* send frame to upper layers that use semaphores */
		    netif_packet(skb);
		} else {
                /* re-enable bd with current buffer */
                eth_init_rx_bd(info, bd_mgt, NO); /* always return without
error */
            }

            /* set next index */
            bd_table->rx_out = (bd_table->rx_out == (ETH_RX_BD_NUMBER -
1))?0:(bd_table->rx_out + 1);

            /* get next bd mgt pointer */
            bd_mgt = &bd_table->rx[bd_table->rx_out];
        }

        /* enable interrupt */
        info->eth_reg->fcc_fccm |= FCC_ENET_RXF;
    }
    return(0);
}

^ permalink raw reply

* Re: [Alsa-devel] [RFC 5/8] snd-aoa: add codecs
From: Johannes Berg @ 2006-06-06 11:36 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linuxppc-dev, alsa-devel
In-Reply-To: <s5hpshr4oiy.wl%tiwai@suse.de>

[-- Attachment #1: Type: text/plain, Size: 287 bytes --]

On Fri, 2006-06-02 at 16:31 +0200, Takashi Iwai wrote:

> FYI:  We're currently implementing a dB conversion.  So, this will be 
> unneeded in future.

Cool, good to know :)

> But I'll included this as it is now at committing, and fix the dB
> things later.

Ok.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 793 bytes --]

^ permalink raw reply

* Re: [Alsa-devel] [RFC 4/8] snd-aoa: add i2sbus
From: Johannes Berg @ 2006-06-06 11:17 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linuxppc-dev, alsa-devel
In-Reply-To: <s5hr7274ox2.wl%tiwai@suse.de>

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

On Fri, 2006-06-02 at 16:23 +0200, Takashi Iwai wrote:
> > +	if (I2S_CLOCK_SPEED_18MHz % rate == 0) {
> > +		if ((I2S_CLOCK_SPEED_18MHz / rate) % mclk == 0) {
> 
> Equivalent with "I2S_CLOCK_SPEED_18MHZ % (rate * mclk) == 0" ?

Yeah, I guess, never really thought about that, just wrote it down the
way I thought to do it :) That said, I think it's more readable if
written that way, do you want me to change it regardless?

> > +	list_for_each_entry(cii, &sdev->codec_list, list) {
> > +		if (cii->codec->open) {
> > +			err = cii->codec->open(cii, pi->substream);
> > +			if (err) {
> > +				result = err;
> > +				goto out_unlock;
> 
> What happens if the first code is opened but fail the secondary?
> No need to close the first?

Yes, that needs to be done, good catch.

> > +	/* well, we really should support scatter/gather DMA */
> > +	/* FIXME FIXME FIXME: If this fails, we BUG() when the alsa layer
> > +	 * later tries to allocate memory. Apparently we should be setting
> > +	 * some device pointer for that ...
> > +	 */
> > +	snd_pcm_lib_preallocate_pages_for_all(
> > +		dev->pcm, SNDRV_DMA_TYPE_DEV,
> > +		snd_dma_pci_data(macio_get_pci_dev(i2sdev->macio)),
> > +		64 * 1024, 64 * 1024);
> 
> Is the comment true?  Yes, you have to set the device pointer via
> snd_pcm_lib_preallocate*().  But it must be OK even if preallocate
> fails.

Hah, I don't know actually, I didn't know you set the pointer using this
function, when I wrote the comment I just had forgotten the preallocate
call!
Does that mean that _preallocate_pages_for_all() has the side effect of
setting the pointer? If so, imho that's pretty bad.

> > +static irqreturn_t i2sbus_bus_intr(int irq, void *devid, struct pt_regs *regs)
> > +{
> > +	struct i2sbus_dev *dev = devid;
> > +	u32 intreg;
> > +
> > +	spin_lock(&dev->low_lock);
> > +	intreg = in_le32(&dev->intfregs->intr_ctl);
> > +
> > +	printk(KERN_INFO "i2sbus: interrupt, intr reg is 0x%x!\n", intreg);
> 
> Should this be really always printed?

no, gone.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 793 bytes --]

^ permalink raw reply

* RE: [PATCH/2.6.17-rc4 4/10]Powerpc:  Add tsi108 pic support
From: Benjamin Herrenschmidt @ 2006-06-06 10:17 UTC (permalink / raw)
  To: Zang Roy-r61911
  Cc: Alexandre Bounine, Yang Xin-Xin-r48390, Paul Mackerras,
	linuxppc-dev list
In-Reply-To: <9FCDBA58F226D911B202000BDBAD4673067CD108@zch01exm40.ap.freescale.net>

On Tue, 2006-06-06 at 17:43 +0800, Zang Roy-r61911 wrote:

> Update Tsi108 implementation of MPIC.
> Any comment? 
> 
> Integrate Tundra Semiconductor tsi108 host bridge interrupt controller 
> to mpic arch.

Looks much better :) Still a few things... 

> +	mpic = mpic_alloc(mpic_paddr,
> +			MPIC_PRIMARY | MPIC_BIG_ENDIAN | MPIC_WANTS_RESET |
> +			MPIC_SPV_EOI | MPIC_CASC_NOEOI | 
> +			MPIC_MOD_ID(MPIC_ID_TSI108),
> +			0, /* num_sources used */
> +			TSI108_IRQ_BASE,
> +			0, /* num_sources used */
> +			NR_IRQS - 4 /* XXXX */,
> +			mpc7448_hpc2_pic_initsenses,
> +			sizeof(mpc7448_hpc2_pic_initsenses), "Tsi108_PIC");

That's a hell lot of new flags... I'm not sure we need that many or a
single TSI108 one that encloses all the new ones. Also, I'm not sure we
need that model ID encoding thing. Let's do things simple, besides, I
don't want to encourage HW folks into doing the same kind of contraption
in the future (btw, tell the TSI folks for me that they had a BAD BAD
BAD idea to muck around with the base design that way, especially
changing the register map in incompatible ways for no good reason).

> +	/* Configure MPIC outputs to CPU0 */
> +	tsi108_write_reg(TSI108_MPIC_OFFSET + 0x30c, 0);
>  }

It doesn't use the standard multiple processor outputs mecanism of
MPIC ?
 
> +static struct mpic_info mpic_infos[] = {
> +	[0] = {	/* Original OpenPIC compatible MPIC */
> +	.greg_base	= MPIC_GREG_BASE,
> +	.greg_frr0	= MPIC_GREG_FEATURE_0,
> +	.greg_config0	= MPIC_GREG_GLOBAL_CONF_0,
> +	.greg_vendor_id	= MPIC_GREG_VENDOR_ID,
> +	.greg_ipi_vp0	= MPIC_GREG_IPI_VECTOR_PRI_0,
> +	.greg_ipi_stride	= MPIC_GREG_IPI_STRIDE,
> +	.greg_spurious	= MPIC_GREG_SPURIOUS,
> +	.greg_tfrr	= MPIC_GREG_TIMER_FREQ,
> +

   .../...

It's a bit sad to have to go all the way to doing such tables, but I
suspect it's probably the best way to handle it at this point. Send more
nastygrams to the HW folks for me.

>  	mpic->num_sources = 0; /* so far */
>  	mpic->senses = senses;
>  	mpic->senses_count = senses_count;
> +	mpic->hw_set = &mpic_infos[MPIC_GET_MOD_ID(flags)];

Well... the model ID thing might not be that a bad idea in the end :) I
need to think about it. I might have to deal with yet another MPIC that
has another regiser map (yeah yeah, TSI aren't the only ones to not get
it)... 

  .../...

> @@ -963,7 +1043,7 @@ int mpic_get_one_irq(struct mpic *mpic, 
>  {
>  	u32 irq;
>  
> -	irq = mpic_cpu_read(MPIC_CPU_INTACK) & MPIC_VECPRI_VECTOR_MASK;
> +	irq = mpic_cpu_read(mpic->hw_set->cpu_intack) & mpic->hw_set->irq_vpr_vector;
>  #ifdef DEBUG_LOW
>  	DBG("%s: get_one_irq(): %d\n", mpic->name, irq);
>  #endif
> @@ -972,11 +1052,18 @@ #ifdef DEBUG_LOW
>  		DBG("%s: cascading ...\n", mpic->name);
>  #endif
>  		irq = mpic->cascade(regs, mpic->cascade_data);
> -		mpic_eoi(mpic);
> +#ifdef DEBUG_LOW
> +		DBG("%s: cascaded irq: %d\n", mpic->name, irq);
> +#endif
> +		if (!(mpic->flags & MPIC_CASC_NOEOI))
> +			mpic_eoi(mpic);
>  		return irq;
>  	}

Can you tell me why you need the above ? (Why you aren't EOI'ing the
cascade ?) Note that the cascade handling is going away from mpic anyway
with the port to genirq that I'll publish later this week for 2.6.18 and
it will almost be handled as a normal interrupt...

> -	if (unlikely(irq == MPIC_VEC_SPURRIOUS))
> +	if (unlikely(irq == MPIC_VEC_SPURRIOUS)) {
> +		if (mpic->flags & MPIC_SPV_EOI)
> +			mpic_eoi(mpic);
>  		return -1;
> +	}

I think the above thing could just test the model ID. It's unlikely that
another implementation need the same "feature", so just test the model
ID rather than adding a flag and if we ever have another model with the
same "feature", then we'll go back to adding a flag :)

Cheers,
Ben.

^ permalink raw reply

* RE: eth0: tx queue full
From: Li Yang-r58472 @ 2006-06-06 10:05 UTC (permalink / raw)
  To: 'salvatore cusenza', linuxppc-embedded

Can you repeat the problem?  Kernel 2.4.20 is considerable old even for 
2.4 series.  Could you try to replace the fec driver from latest 2.4 kernel?

Best Regards,
Leo

> -----Original Message-----
> From: linuxppc-embedded-bounces+leoli=freescale.com@ozlabs.org
> [mailto:linuxppc-embedded-bounces+leoli=freescale.com@ozlabs.org] On Behalf
> Of salvatore cusenza
> Sent: Tuesday, June 06, 2006 4:14 PM
> To: linuxppc-embedded@ozlabs.org
> Subject: eth0: tx queue full
> 
> At runtime during the usual life of my board (MPC852 and linux-2.4.20 Denk's
> distribution)
>  I have experienced the following crash:
> 
> 
> eth0: tx queue full!.
> eth0: tx queue full!.
> eth0: tx queue full!.
> 
> Oops: kernel access of bad area, sig: 11
> NIP: C000D440 XER: 00000000 LR: C00BB040 SP: C0C9BC10 REGS: c0c9bb60 TRAP: 0300
> Tainted: P
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 00001F9D, DSISR: 000000E4
> TASK = c0c9a000[145] 'L5421' Last syscall: 4
> last math 00000000 last altivec 00000000
> GPR00: 00000000 C0C9BC10 C0C9A000 C0F56D70 00001F99 0000003C C0F56D6C 00000007
> GPR08: 00000001 0000003C 00000000 C0F56DB0 C0D83C3C 10071D28 00000000 C3120000
> GPR16: C311CB04 C311C8D8 C311C754 C0170000 C3120000 C311CB30 00000001 C0169DA0
> GPR24: F0000E00 00001F9D C0F58400 0000003C 00000040 C2080100 C0F58200 C0F501B0
> Call backtrace:
> C00BAF8C C00BABC8 C0005848 C3119448 C31194C0 C31194F8 C00066F8
> C0011A48 C00BA8FC C00CADD8 C00C3F00 C0016B50 C00AFE4C C00B4B94
> C00B5EF4 C003571C C000457C 0FFD5E4C 0FEDB8DC 0FEDB284 1003B5B8
> 1003D558 1003A88C 0FED34A4 0FED32D0 0FFCFEE4 0FD5F590
> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>  <0>Rebooting in 180 seconds..
> 
> 
> Could you suggest me something to investigate?
> 

^ permalink raw reply

* RE: [PATCH/2.6.17-rc4 4/10]Powerpc:  Add tsi108 pic support
From: Zang Roy-r61911 @ 2006-06-06  9:43 UTC (permalink / raw)
  To: Alexandre Bounine, Benjamin Herrenschmidt
  Cc: linuxppc-dev list, Paul Mackerras, Yang Xin-Xin-r48390

> The items 1 and 2 not needed. I moved them from the 
> tsi108_pic.c but really they have been leftovers from the old 
> code there. Originally code ppc/syslib/open_pic.c was used as 
> a template, which was good fit for the blocking output mode 
> of the tsi108 PIC (plus some extra tweaks). In that mode EOI 
> has to be signaled to PIC to deactivate interrupt line to the 
> CPU before it re-enables local interrupts. This is why we 
> have got ack routine. We changed mode later and removed most 
> of workarounds except these two.
> 
> I removed code for 1 and 2 and will send a patch to Roy after 
> retesting (probably this weekend with some other stuff).
> 
> Alex.
>               
> 
> On Thu, 2006-06-01 at 16:45 -0400, Alexandre Bounine wrote:
> > All differences in the Tsi108/109 PIC code from the 
> standard MPIC are caused by the HW behavior. The Tsi108/109 
> PIC looks like standard OpenPIC but, in fact, is different in 
> registers mapping and behavior. Its logic is close but not 
> exactly as MPIC.  
> > 
> > Here are replies on comments to the code:
> > 
> > 1.Why do you have to check if its a LEVEL irq?
> > 
> > Check for LEVEL irqs is required in the ack/end pair to 
> enable nested 
> > interrupt servicing and does not hang when core (local) 
> interrupts are 
> > re-enabled in the ISR. Otherwise we have to use 
> SA_INTERRUPT flag for all level signaled interrupts.
> 
> Can you be more precise about what exactly happens and why ? 
> Unless you EOI handling is broken of course, there should be 
> no need to do anything other than a single eoi in end(), period.
> 
> > 2. if the PIC works like other openpic's you dont need an 'ack' we 
> > handle it via 'end'
> > 
> > Tsi108/109 needs it.
> 
> What for ? Please, give the low level details.
> 
> > 3. why the changes to where we do mpic_eoi for TSI108?
> > The Tsi108 PIC requires EOI for spurious interrupt (as all 
> other interrupt sources).
> 
> Ok, that is acceptable.
> 
> > The do_IRQ() does not call end routine for spurious interrupts.  
> > 
> > The MPIC code patch for Tsi108/109 demonstrates HW 
> differences and why we originally considered having separate 
> code for Tsi108 pic.
> 
> Please tell me more about the HW differences :)
> 
> Ben.


Update Tsi108 implementation of MPIC.
Any comment? 

Integrate Tundra Semiconductor tsi108 host bridge interrupt controller 
to mpic arch.

Signed-off-by: Alexandre Bounine <alexandreb@tundra.com>
Signed-off-by: Roy Zang		<tie-fei.zang@freescale.com>

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7e4d38e..d8a9add 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -408,7 +408,8 @@ config U3_DART
 	default n
 
 config MPIC
-	depends on PPC_PSERIES || PPC_PMAC || PPC_MAPLE || PPC_CHRP
+	depends on PPC_PSERIES || PPC_PMAC || PPC_MAPLE || PPC_CHRP \
+			       || MPC7448HPC2
 	bool
 	default y
 
diff --git a/arch/powerpc/configs/mpc7448_hpc2_defconfig b/arch/powerpc/configs/mpc7448_hpc2_defconfig
index 28c7c8b..15a50f4 100644
--- a/arch/powerpc/configs/mpc7448_hpc2_defconfig
+++ b/arch/powerpc/configs/mpc7448_hpc2_defconfig
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.17-rc4
-# Tue May 23 11:29:48 2006
+# Sat May 27 18:45:55 2006
 #
 # CONFIG_PPC64 is not set
 CONFIG_PPC32=y
@@ -110,6 +110,7 @@ # CONFIG_PPC_MULTIPLATFORM is not set
 # CONFIG_PPC_ISERIES is not set
 CONFIG_EMBEDDED6xx=y
 # CONFIG_APUS is not set
+CONFIG_MPIC=y
 # CONFIG_PPC_RTAS is not set
 # CONFIG_MMIO_NVRAM is not set
 # CONFIG_PPC_MPC106 is not set
@@ -899,6 +900,7 @@ CONFIG_LOG_BUF_SHIFT=14
 # CONFIG_DEBUG_FS is not set
 # CONFIG_UNWIND_INFO is not set
 # CONFIG_BOOTX_TEXT is not set
+# CONFIG_SERIAL_TEXT_DEBUG is not set
 # CONFIG_PPC_EARLY_DEBUG_LPAR is not set
 # CONFIG_PPC_EARLY_DEBUG_G5 is not set
 # CONFIG_PPC_EARLY_DEBUG_RTAS is not set
diff --git a/arch/powerpc/platforms/embedded6xx/Kconfig b/arch/powerpc/platforms/embedded6xx/Kconfig
index e125ec6..2b78196 100644
--- a/arch/powerpc/platforms/embedded6xx/Kconfig
+++ b/arch/powerpc/platforms/embedded6xx/Kconfig
@@ -79,6 +79,7 @@ config MPC7448HPC2
 	select TSI108_BRIDGE
 	select DEFAULT_UIMAGE
 	select PPC_UDBG_16550
+	select MPIC
 	help
 	  Select MPC7448HPC2 if configuring for Freescale MPC7448HPC2 (Taiga)
 	  platform
diff --git a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
index db5dc10..458f70d 100644
--- a/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
+++ b/arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c
@@ -43,7 +43,16 @@ #include <asm/pci-bridge.h>
 #include <asm/reg.h>
 #include <mm/mmu_decl.h>
 #include "mpc7448_hpc2.h"
-#include <asm/tsi108_pic.h>
+#include <asm/tsi108_irq.h>
+#include <asm/mpic.h>
+
+#undef DEBUG
+
+#ifdef DEBUG
+#define DBG(fmt...) do { printk(fmt); } while(0)
+#else
+#define DBG(fmt...) do { } while(0)
+#endif
 
 #ifndef CONFIG_PCI
 isa_io_base = MPC7448_HPC2_ISA_IO_BASE;
@@ -53,20 +62,8 @@ #endif
 
 extern int add_bridge(struct device_node *dev);
 extern void _nmask_and_or_msr(unsigned long nmask, unsigned long or_val);
-
-#ifdef TSI108_ETH
-hw_info hw_info_table[TSI108_ETH_MAX_PORTS + 1] = {
-	{TSI108_CSR_ADDR_PHYS + TSI108_ETH_OFFSET,
-	 TSI108_CSR_ADDR_PHYS + TSI108_ETH_OFFSET,
-	 TSI108_PHY0_ADDR, IRQ_TSI108_GIGE0},
-
-	{TSI108_CSR_ADDR_PHYS + TSI108_ETH_OFFSET + 0x400,
-	 TSI108_CSR_ADDR_PHYS + TSI108_ETH_OFFSET,
-	 TSI108_PHY1_ADDR, IRQ_TSI108_GIGE1},
-
-	{TBL_END, TBL_END, TBL_END, TBL_END}
-};
-#endif
+extern void tsi108_pci_int_init(void);
+extern int tsi108_irq_cascade(struct pt_regs *regs, void *unused);
 
 /*
  * Define all of the IRQ senses and polarities.  Taken from the
@@ -76,10 +73,32 @@ #endif
  */
 
 static u_char mpc7448_hpc2_pic_initsenses[] __initdata = {
+	/* External on-board sources */
 	(IRQ_SENSE_LEVEL | IRQ_POLARITY_NEGATIVE),	/* INT[0] XINT0 from FPGA */
 	(IRQ_SENSE_LEVEL | IRQ_POLARITY_NEGATIVE),	/* INT[1] XINT1 from FPGA */
 	(IRQ_SENSE_LEVEL | IRQ_POLARITY_NEGATIVE),	/* INT[2] PHY_INT from both GIGE */
 	(IRQ_SENSE_LEVEL | IRQ_POLARITY_NEGATIVE),	/* INT[3] RESERVED */
+	/* Internal Tsi108/109 interrupt sources */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* DMA0 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* DMA1 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* DMA2 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* DMA3 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* UART0 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* UART1 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* I2C */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* GPIO */
+	(IRQ_SENSE_LEVEL | IRQ_POLARITY_POSITIVE),	/* GIGE0 */
+	(IRQ_SENSE_LEVEL | IRQ_POLARITY_POSITIVE),	/* GIGE1 */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* HLP */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* SDC */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Processor IF */
+	(IRQ_SENSE_EDGE  | IRQ_POLARITY_POSITIVE),	/* Reserved IRQ */
+	(IRQ_SENSE_LEVEL | IRQ_POLARITY_POSITIVE),	/* PCI/X block */
 };
 
 /*
@@ -196,18 +215,44 @@ #endif
  */
 static void __init mpc7448_hpc2_init_IRQ(void)
 {
+	struct mpic *mpic;
+	phys_addr_t mpic_paddr = 0;
+	struct device_node *tsi_pic;
+
+	tsi_pic = of_find_node_by_type(NULL, "open-pic");
+	if (tsi_pic) {
+		unsigned int size;
+		void *prop = get_property(tsi_pic, "reg", &size);
+		mpic_paddr = of_translate_address(tsi_pic, prop);
+	}
 
-	tsi108_pic_init(mpc7448_hpc2_pic_initsenses);
+	if (mpic_paddr == 0) {
+		printk("%s: No tsi108 PIC found !\n", __FUNCTION__);
+		return;
+	}
 
-	/* Configure MPIC outputs to CPU0 */
-	tsi108_pic_set_output(0, IRQ_SENSE_EDGE, IRQ_POLARITY_NEGATIVE);
-}
+	DBG("%s: tsi108pic phys_addr = 0x%x\n", __FUNCTION__,
+	    (u32) mpic_paddr);
 
-static void __init mpc7448_hpc2_map_io(void)
-{
-	/* Tsi108 CSR mapping */
-	io_block_mapping(TSI108_CSR_ADDR_VIRT, TSI108_CSR_ADDR_PHYS,
-			 0x100000, _PAGE_IO);
+	mpic = mpic_alloc(mpic_paddr,
+			MPIC_PRIMARY | MPIC_BIG_ENDIAN | MPIC_WANTS_RESET |
+			MPIC_SPV_EOI | MPIC_CASC_NOEOI | 
+			MPIC_MOD_ID(MPIC_ID_TSI108),
+			0, /* num_sources used */
+			TSI108_IRQ_BASE,
+			0, /* num_sources used */
+			NR_IRQS - 4 /* XXXX */,
+			mpc7448_hpc2_pic_initsenses,
+			sizeof(mpc7448_hpc2_pic_initsenses), "Tsi108_PIC");
+
+	BUG_ON(mpic == NULL); /* XXXX */
+
+	mpic_init(mpic);
+	mpic_setup_cascade(IRQ_TSI108_PCI, tsi108_irq_cascade, mpic);
+	tsi108_pci_int_init();
+
+	/* Configure MPIC outputs to CPU0 */
+	tsi108_write_reg(TSI108_MPIC_OFFSET + 0x30c, 0);
 }
 
 void mpc7448_hpc2_show_cpuinfo(struct seq_file *m)
@@ -269,10 +314,9 @@ define_machine(mpc7448_hpc2){
 	.setup_arch 		= mpc7448_hpc2_setup_arch,
 	.init_IRQ 		= mpc7448_hpc2_init_IRQ,
 	.show_cpuinfo 		= mpc7448_hpc2_show_cpuinfo,
-	.get_irq 		= tsi108_pic_get_irq,
+	.get_irq 		= mpic_get_irq,
 	.restart 		= mpc7448_hpc2_restart,
 	.calibrate_decr 	= generic_calibrate_decr,
-	.setup_io_mappings 	= mpc7448_hpc2_map_io,
 	.machine_check_exception= mpc7448_machine_check_exception,
 	.progress 		= udbg_progress,
 };
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index 8c0afb7..048e1f6 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -8,4 +8,4 @@ obj-$(CONFIG_U3_DART)		+= dart_iommu.o
 obj-$(CONFIG_MMIO_NVRAM)	+= mmio_nvram.o
 obj-$(CONFIG_PPC_83xx)		+= ipic.o
 obj-$(CONFIG_FSL_SOC)		+= fsl_soc.o
-obj-$(CONFIG_TSI108_BRIDGE)	+= tsi108_common.o tsi108_pic.o
+obj-$(CONFIG_TSI108_BRIDGE)	+= tsi108_common.o tsi108_pci_int.o
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index 7dcdfcb..09cb0eb 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -55,6 +55,78 @@ #define distribute_irqs	(0)
 #endif
 #endif
 
+static struct mpic_info mpic_infos[] = {
+	[0] = {	/* Original OpenPIC compatible MPIC */
+	.greg_base	= MPIC_GREG_BASE,
+	.greg_frr0	= MPIC_GREG_FEATURE_0,
+	.greg_config0	= MPIC_GREG_GLOBAL_CONF_0,
+	.greg_vendor_id	= MPIC_GREG_VENDOR_ID,
+	.greg_ipi_vp0	= MPIC_GREG_IPI_VECTOR_PRI_0,
+	.greg_ipi_stride	= MPIC_GREG_IPI_STRIDE,
+	.greg_spurious	= MPIC_GREG_SPURIOUS,
+	.greg_tfrr	= MPIC_GREG_TIMER_FREQ,
+
+	.timer_base	= MPIC_TIMER_BASE,
+	.timer_stride	= MPIC_TIMER_STRIDE,
+	.timer_ccr	= MPIC_TIMER_CURRENT_CNT,
+	.timer_bcr	= MPIC_TIMER_BASE_CNT,
+	.timer_vpr	= MPIC_TIMER_VECTOR_PRI,
+	.timer_dest	= MPIC_TIMER_DESTINATION,
+
+	.cpu_base	= MPIC_CPU_BASE,
+	.cpu_stride	= MPIC_CPU_STRIDE,
+	.cpu_ipi_disp0	= MPIC_CPU_IPI_DISPATCH_0,
+	.cpu_ipi_disp_stride	= MPIC_CPU_IPI_DISPATCH_STRIDE,
+	.cpu_task_pri	= MPIC_CPU_CURRENT_TASK_PRI,
+	.cpu_whoami	= MPIC_CPU_WHOAMI,
+	.cpu_intack	= MPIC_CPU_INTACK,
+	.cpu_eoi	= MPIC_CPU_EOI,
+
+	.irq_base	= MPIC_IRQ_BASE,
+	.irq_stride	= MPIC_IRQ_STRIDE,
+	.irq_vpr	= MPIC_IRQ_VECTOR_PRI,
+	.irq_vpr_vector	= MPIC_VECPRI_VECTOR_MASK,
+	.irq_vpr_polpos	= MPIC_VECPRI_POLARITY_POSITIVE,
+	.irq_vpr_senlvl	= MPIC_VECPRI_SENSE_LEVEL,
+	.irq_dest	= MPIC_IRQ_DESTINATION,
+	},
+
+	[1] = {	/* Tsi108/109 PIC */
+	.greg_base	= TSI108_GREG_BASE,
+	.greg_frr0	= TSI108_GREG_FEATURE_0,
+	.greg_config0	= TSI108_GREG_GLOBAL_CONF_0,
+	.greg_vendor_id	= TSI108_GREG_VENDOR_ID,
+	.greg_ipi_vp0	= TSI108_GREG_IPI_VECTOR_PRI_0,
+	.greg_ipi_stride	= TSI108_GREG_IPI_STRIDE,
+	.greg_spurious	= TSI108_GREG_SPURIOUS,
+	.greg_tfrr	= TSI108_GREG_TIMER_FREQ,
+
+	.timer_base	= TSI108_TIMER_BASE,
+	.timer_stride	= TSI108_TIMER_STRIDE,
+	.timer_ccr	= TSI108_TIMER_CURRENT_CNT,
+	.timer_bcr	= TSI108_TIMER_BASE_CNT,
+	.timer_vpr	= TSI108_TIMER_VECTOR_PRI,
+	.timer_dest	= TSI108_TIMER_DESTINATION,
+
+	.cpu_base	= TSI108_CPU_BASE,
+	.cpu_stride	= TSI108_CPU_STRIDE,
+	.cpu_ipi_disp0	= TSI108_CPU_IPI_DISPATCH_0,
+	.cpu_ipi_disp_stride	= TSI108_CPU_IPI_DISPATCH_STRIDE,
+	.cpu_task_pri	= TSI108_CPU_CURRENT_TASK_PRI,
+	.cpu_whoami	= 0xFFFFFFFF,
+	.cpu_intack	= TSI108_CPU_INTACK,
+	.cpu_eoi	= TSI108_CPU_EOI,
+
+	.irq_base	= TSI108_IRQ_REG_BASE,
+	.irq_stride	= TSI108_IRQ_STRIDE,
+	.irq_vpr	= TSI108_IRQ_VECTOR_PRI,
+	.irq_vpr_vector = TSI108_VECPRI_VECTOR_MASK,
+	.irq_vpr_polpos = TSI108_VECPRI_POLARITY_POSITIVE,
+	.irq_vpr_senlvl = TSI108_VECPRI_SENSE_LEVEL,
+	.irq_dest	= TSI108_IRQ_DESTINATION,
+	},
+};
+
 /*
  * Register accessor functions
  */
@@ -81,7 +153,8 @@ static inline void _mpic_write(unsigned 
 static inline u32 _mpic_ipi_read(struct mpic *mpic, unsigned int ipi)
 {
 	unsigned int be = (mpic->flags & MPIC_BIG_ENDIAN) != 0;
-	unsigned int offset = MPIC_GREG_IPI_VECTOR_PRI_0 + (ipi * 0x10);
+	unsigned int offset = mpic->hw_set->greg_ipi_vp0 +
+			      (ipi * mpic->hw_set->greg_ipi_stride);
 
 	if (mpic->flags & MPIC_BROKEN_IPI)
 		be = !be;
@@ -90,7 +163,8 @@ static inline u32 _mpic_ipi_read(struct 
 
 static inline void _mpic_ipi_write(struct mpic *mpic, unsigned int ipi, u32 value)
 {
-	unsigned int offset = MPIC_GREG_IPI_VECTOR_PRI_0 + (ipi * 0x10);
+	unsigned int offset = mpic->hw_set->greg_ipi_vp0 +
+			      (ipi * mpic->hw_set->greg_ipi_stride);
 
 	_mpic_write(mpic->flags & MPIC_BIG_ENDIAN, mpic->gregs, offset, value);
 }
@@ -121,7 +195,7 @@ static inline u32 _mpic_irq_read(struct 
 	unsigned int	idx = src_no & mpic->isu_mask;
 
 	return _mpic_read(mpic->flags & MPIC_BIG_ENDIAN, mpic->isus[isu],
-			  reg + (idx * MPIC_IRQ_STRIDE));
+			  reg + (idx * mpic->hw_set->irq_stride));
 }
 
 static inline void _mpic_irq_write(struct mpic *mpic, unsigned int src_no,
@@ -131,7 +205,7 @@ static inline void _mpic_irq_write(struc
 	unsigned int	idx = src_no & mpic->isu_mask;
 
 	_mpic_write(mpic->flags & MPIC_BIG_ENDIAN, mpic->isus[isu],
-		    reg + (idx * MPIC_IRQ_STRIDE), value);
+		    reg + (idx * mpic->hw_set->irq_stride), value);
 }
 
 #define mpic_read(b,r)		_mpic_read(mpic->flags & MPIC_BIG_ENDIAN,(b),(r))
@@ -157,8 +231,8 @@ static void __init mpic_test_broken_ipi(
 {
 	u32 r;
 
-	mpic_write(mpic->gregs, MPIC_GREG_IPI_VECTOR_PRI_0, MPIC_VECPRI_MASK);
-	r = mpic_read(mpic->gregs, MPIC_GREG_IPI_VECTOR_PRI_0);
+	mpic_write(mpic->gregs, mpic->hw_set->greg_ipi_vp0, MPIC_VECPRI_MASK);
+	r = mpic_read(mpic->gregs, mpic->hw_set->greg_ipi_vp0);
 
 	if (r == le32_to_cpu(MPIC_VECPRI_MASK)) {
 		printk(KERN_INFO "mpic: Detected reversed IPI registers\n");
@@ -392,8 +466,8 @@ static inline struct mpic * mpic_from_ir
 /* Send an EOI */
 static inline void mpic_eoi(struct mpic *mpic)
 {
-	mpic_cpu_write(MPIC_CPU_EOI, 0);
-	(void)mpic_cpu_read(MPIC_CPU_WHOAMI);
+	mpic_cpu_write(mpic->hw_set->cpu_eoi, 0);
+	(void)mpic_cpu_read(mpic->hw_set->cpu_task_pri);
 }
 
 #ifdef CONFIG_SMP
@@ -419,8 +493,8 @@ static void mpic_enable_irq(unsigned int
 
 	DBG("%p: %s: enable_irq: %d (src %d)\n", mpic, mpic->name, irq, src);
 
-	mpic_irq_write(src, MPIC_IRQ_VECTOR_PRI,
-		       mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) &
+	mpic_irq_write(src, mpic->hw_set->irq_vpr,
+		       mpic_irq_read(src, mpic->hw_set->irq_vpr) &
 		       ~MPIC_VECPRI_MASK);
 
 	/* make sure mask gets to controller before we return to user */
@@ -429,7 +503,7 @@ static void mpic_enable_irq(unsigned int
 			printk(KERN_ERR "mpic_enable_irq timeout\n");
 			break;
 		}
-	} while(mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) & MPIC_VECPRI_MASK);	
+	} while(mpic_irq_read(src, mpic->hw_set->irq_vpr) & MPIC_VECPRI_MASK);	
 
 #ifdef CONFIG_MPIC_BROKEN_U3
 	if (mpic->flags & MPIC_BROKEN_U3) {
@@ -466,8 +540,8 @@ static void mpic_disable_irq(unsigned in
 
 	DBG("%s: disable_irq: %d (src %d)\n", mpic->name, irq, src);
 
-	mpic_irq_write(src, MPIC_IRQ_VECTOR_PRI,
-		       mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) |
+	mpic_irq_write(src, mpic->hw_set->irq_vpr,
+		       mpic_irq_read(src, mpic->hw_set->irq_vpr) |
 		       MPIC_VECPRI_MASK);
 
 	/* make sure mask gets to controller before we return to user */
@@ -476,7 +550,7 @@ static void mpic_disable_irq(unsigned in
 			printk(KERN_ERR "mpic_enable_irq timeout\n");
 			break;
 		}
-	} while(!(mpic_irq_read(src, MPIC_IRQ_VECTOR_PRI) & MPIC_VECPRI_MASK));
+	} while(!(mpic_irq_read(src, mpic->hw_set->irq_vpr) & MPIC_VECPRI_MASK));
 }
 
 static void mpic_shutdown_irq(unsigned int irq)
@@ -557,7 +631,7 @@ static void mpic_set_affinity(unsigned i
 
 	cpus_and(tmp, cpumask, cpu_online_map);
 
-	mpic_irq_write(irq - mpic->irq_offset, MPIC_IRQ_DESTINATION,
+	mpic_irq_write(irq - mpic->irq_offset, mpic->hw_set->irq_dest,
 		       mpic_physmask(cpus_addr(tmp)[0]));	
 }
 
@@ -613,18 +687,20 @@ #endif /* CONFIG_SMP */
 	mpic->num_sources = 0; /* so far */
 	mpic->senses = senses;
 	mpic->senses_count = senses_count;
+	mpic->hw_set = &mpic_infos[MPIC_GET_MOD_ID(flags)];
 
 	/* Map the global registers */
-	mpic->gregs = ioremap(phys_addr + MPIC_GREG_BASE, 0x1000);
-	mpic->tmregs = mpic->gregs + ((MPIC_TIMER_BASE - MPIC_GREG_BASE) >> 2);
+	mpic->gregs = ioremap(phys_addr + mpic->hw_set->greg_base, 0x1000);
+	mpic->tmregs = mpic->gregs +
+		       ((mpic->hw_set->timer_base - mpic->hw_set->greg_base) >> 2);
 	BUG_ON(mpic->gregs == NULL);
 
 	/* Reset */
 	if (flags & MPIC_WANTS_RESET) {
-		mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_0,
-			   mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_0)
+		mpic_write(mpic->gregs, mpic->hw_set->greg_config0,
+			   mpic_read(mpic->gregs, mpic->hw_set->greg_config0)
 			   | MPIC_GREG_GCONF_RESET);
-		while( mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_0)
+		while( mpic_read(mpic->gregs, mpic->hw_set->greg_config0)
 		       & MPIC_GREG_GCONF_RESET)
 			mb();
 	}
@@ -633,7 +709,7 @@ #endif /* CONFIG_SMP */
 	 * MPICs, num sources as well. On ISU MPICs, sources are counted
 	 * as ISUs are added
 	 */
-	reg = mpic_read(mpic->gregs, MPIC_GREG_FEATURE_0);
+	reg = mpic_read(mpic->gregs, mpic->hw_set->greg_frr0);
 	mpic->num_cpus = ((reg & MPIC_GREG_FEATURE_LAST_CPU_MASK)
 			  >> MPIC_GREG_FEATURE_LAST_CPU_SHIFT) + 1;
 	if (isu_size == 0)
@@ -642,16 +718,16 @@ #endif /* CONFIG_SMP */
 
 	/* Map the per-CPU registers */
 	for (i = 0; i < mpic->num_cpus; i++) {
-		mpic->cpuregs[i] = ioremap(phys_addr + MPIC_CPU_BASE +
-					   i * MPIC_CPU_STRIDE, 0x1000);
+		mpic->cpuregs[i] = ioremap(phys_addr + mpic->hw_set->cpu_base +
+					   i * mpic->hw_set->cpu_stride, 0x1000);
 		BUG_ON(mpic->cpuregs[i] == NULL);
 	}
 
 	/* Initialize main ISU if none provided */
 	if (mpic->isu_size == 0) {
 		mpic->isu_size = mpic->num_sources;
-		mpic->isus[0] = ioremap(phys_addr + MPIC_IRQ_BASE,
-					MPIC_IRQ_STRIDE * mpic->isu_size);
+		mpic->isus[0] = ioremap(phys_addr + mpic->hw_set->irq_base,
+					mpic->hw_set->irq_stride * mpic->isu_size);
 		BUG_ON(mpic->isus[0] == NULL);
 	}
 	mpic->isu_shift = 1 + __ilog2(mpic->isu_size - 1);
@@ -693,7 +769,8 @@ void __init mpic_assign_isu(struct mpic 
 
 	BUG_ON(isu_num >= MPIC_MAX_ISU);
 
-	mpic->isus[isu_num] = ioremap(phys_addr, MPIC_IRQ_STRIDE * mpic->isu_size);
+	mpic->isus[isu_num] = ioremap(phys_addr,
+				      mpic->hw_set->irq_stride * mpic->isu_size);
 	if ((isu_first + mpic->isu_size) > mpic->num_sources)
 		mpic->num_sources = isu_first + mpic->isu_size;
 }
@@ -729,14 +806,15 @@ void __init mpic_init(struct mpic *mpic)
 	printk(KERN_INFO "mpic: Initializing for %d sources\n", mpic->num_sources);
 
 	/* Set current processor priority to max */
-	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0xf);
+	mpic_cpu_write(mpic->hw_set->cpu_task_pri, 0xf);
 
 	/* Initialize timers: just disable them all */
 	for (i = 0; i < 4; i++) {
 		mpic_write(mpic->tmregs,
-			   i * MPIC_TIMER_STRIDE + MPIC_TIMER_DESTINATION, 0);
+			   i * mpic->hw_set->timer_stride +
+			   mpic->hw_set->timer_dest, 0);
 		mpic_write(mpic->tmregs,
-			   i * MPIC_TIMER_STRIDE + MPIC_TIMER_VECTOR_PRI,
+			   i * mpic->hw_set->timer_stride + mpic->hw_set->timer_vpr,
 			   MPIC_VECPRI_MASK |
 			   (MPIC_VEC_TIMER_0 + i));
 	}
@@ -780,14 +858,14 @@ #endif /* CONFIG_MPIC_BROKEN_U3 */
 		/* do senses munging */
 		if (mpic->senses && i < mpic->senses_count) {
 			if (mpic->senses[i] & IRQ_SENSE_LEVEL)
-				vecpri |= MPIC_VECPRI_SENSE_LEVEL;
+				vecpri |= mpic->hw_set->irq_vpr_senlvl;
 			if (mpic->senses[i] & IRQ_POLARITY_POSITIVE)
-				vecpri |= MPIC_VECPRI_POLARITY_POSITIVE;
+				vecpri |= mpic->hw_set->irq_vpr_polpos;
 		} else
-			vecpri |= MPIC_VECPRI_SENSE_LEVEL;
+			vecpri |= mpic->hw_set->irq_vpr_senlvl;
 
 		/* remember if it was a level interrupts */
-		level = (vecpri & MPIC_VECPRI_SENSE_LEVEL);
+		level = (vecpri & mpic->hw_set->irq_vpr_senlvl);
 
 		/* deal with broken U3 */
 		if (mpic->flags & MPIC_BROKEN_U3) {
@@ -795,7 +873,7 @@ #ifdef CONFIG_MPIC_BROKEN_U3
 			if (mpic_is_ht_interrupt(mpic, i)) {
 				vecpri &= ~(MPIC_VECPRI_SENSE_MASK |
 					    MPIC_VECPRI_POLARITY_MASK);
-				vecpri |= MPIC_VECPRI_POLARITY_POSITIVE;
+				vecpri |= mpic->hw_set->irq_vpr_polpos;
 			}
 #else
 			printk(KERN_ERR "mpic: BROKEN_U3 set, but CONFIG doesn't match\n");
@@ -806,8 +884,8 @@ #endif
 		    (level != 0));
 
 		/* init hw */
-		mpic_irq_write(i, MPIC_IRQ_VECTOR_PRI, vecpri);
-		mpic_irq_write(i, MPIC_IRQ_DESTINATION,
+		mpic_irq_write(i, mpic->hw_set->irq_vpr, vecpri);
+		mpic_irq_write(i, mpic->hw_set->irq_dest,
 			       1 << hard_smp_processor_id());
 
 		/* init linux descriptors */
@@ -818,15 +896,16 @@ #endif
 	}
 	
 	/* Init spurrious vector */
-	mpic_write(mpic->gregs, MPIC_GREG_SPURIOUS, MPIC_VEC_SPURRIOUS);
+	mpic_write(mpic->gregs, mpic->hw_set->greg_spurious, MPIC_VEC_SPURRIOUS);
 
-	/* Disable 8259 passthrough */
-	mpic_write(mpic->gregs, MPIC_GREG_GLOBAL_CONF_0,
-		   mpic_read(mpic->gregs, MPIC_GREG_GLOBAL_CONF_0)
-		   | MPIC_GREG_GCONF_8259_PTHROU_DIS);
+	/* Disable 8259 passthrough, if supported */
+	if (MPIC_GET_MOD_ID(mpic->flags) != MPIC_ID_TSI108)
+		mpic_write(mpic->gregs, mpic->hw_set->greg_config0,
+			   mpic_read(mpic->gregs, mpic->hw_set->greg_config0)
+			   | MPIC_GREG_GCONF_8259_PTHROU_DIS);
 
 	/* Set current processor priority to 0 */
-	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0);
+	mpic_cpu_write(mpic->hw_set->cpu_task_pri, 0);
 }
 
 
@@ -845,9 +924,9 @@ void mpic_irq_set_priority(unsigned int 
 		mpic_ipi_write(irq - mpic->ipi_offset,
 			       reg | (pri << MPIC_VECPRI_PRIORITY_SHIFT));
 	} else {
-		reg = mpic_irq_read(irq - mpic->irq_offset,MPIC_IRQ_VECTOR_PRI)
+		reg = mpic_irq_read(irq - mpic->irq_offset,mpic->hw_set->irq_vpr)
 			& ~MPIC_VECPRI_PRIORITY_MASK;
-		mpic_irq_write(irq - mpic->irq_offset, MPIC_IRQ_VECTOR_PRI,
+		mpic_irq_write(irq - mpic->irq_offset, mpic->hw_set->irq_vpr,
 			       reg | (pri << MPIC_VECPRI_PRIORITY_SHIFT));
 	}
 	spin_unlock_irqrestore(&mpic_lock, flags);
@@ -864,7 +943,7 @@ unsigned int mpic_irq_get_priority(unsig
 	if (is_ipi)
 		reg = mpic_ipi_read(irq - mpic->ipi_offset);
 	else
-		reg = mpic_irq_read(irq - mpic->irq_offset, MPIC_IRQ_VECTOR_PRI);
+		reg = mpic_irq_read(irq - mpic->irq_offset, mpic->hw_set->irq_vpr);
 	spin_unlock_irqrestore(&mpic_lock, flags);
 	return (reg & MPIC_VECPRI_PRIORITY_MASK) >> MPIC_VECPRI_PRIORITY_SHIFT;
 }
@@ -890,12 +969,12 @@ #ifdef CONFIG_SMP
  	 */
 	if (distribute_irqs) {
 	 	for (i = 0; i < mpic->num_sources ; i++)
-			mpic_irq_write(i, MPIC_IRQ_DESTINATION,
-				mpic_irq_read(i, MPIC_IRQ_DESTINATION) | msk);
+			mpic_irq_write(i, mpic->hw_set->irq_dest,
+				mpic_irq_read(i, mpic->hw_set->irq_dest) | msk);
 	}
 
 	/* Set current processor priority to 0 */
-	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0);
+	mpic_cpu_write(mpic->hw_set->cpu_task_pri, 0);
 
 	spin_unlock_irqrestore(&mpic_lock, flags);
 #endif /* CONFIG_SMP */
@@ -905,7 +984,7 @@ int mpic_cpu_get_priority(void)
 {
 	struct mpic *mpic = mpic_primary;
 
-	return mpic_cpu_read(MPIC_CPU_CURRENT_TASK_PRI);
+	return mpic_cpu_read(mpic->hw_set->cpu_task_pri);
 }
 
 void mpic_cpu_set_priority(int prio)
@@ -913,7 +992,7 @@ void mpic_cpu_set_priority(int prio)
 	struct mpic *mpic = mpic_primary;
 
 	prio &= MPIC_CPU_TASKPRI_MASK;
-	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, prio);
+	mpic_cpu_write(mpic->hw_set->cpu_task_pri, prio);
 }
 
 /*
@@ -935,11 +1014,11 @@ void mpic_teardown_this_cpu(int secondar
 
 	/* let the mpic know we don't want intrs.  */
 	for (i = 0; i < mpic->num_sources ; i++)
-		mpic_irq_write(i, MPIC_IRQ_DESTINATION,
-			mpic_irq_read(i, MPIC_IRQ_DESTINATION) & ~msk);
+		mpic_irq_write(i, mpic->hw_set->irq_dest,
+			mpic_irq_read(i, mpic->hw_set->irq_dest) & ~msk);
 
 	/* Set current processor priority to max */
-	mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0xf);
+	mpic_cpu_write(mpic->hw_set->cpu_task_pri, 0xf);
 
 	spin_unlock_irqrestore(&mpic_lock, flags);
 }
@@ -955,7 +1034,8 @@ #ifdef DEBUG_IPI
 	DBG("%s: send_ipi(ipi_no: %d)\n", mpic->name, ipi_no);
 #endif
 
-	mpic_cpu_write(MPIC_CPU_IPI_DISPATCH_0 + ipi_no * 0x10,
+	mpic_cpu_write(mpic->hw_set->cpu_ipi_disp0 +
+		       ipi_no * mpic->hw_set->cpu_ipi_disp_stride,
 		       mpic_physmask(cpu_mask & cpus_addr(cpu_online_map)[0]));
 }
 
@@ -963,7 +1043,7 @@ int mpic_get_one_irq(struct mpic *mpic, 
 {
 	u32 irq;
 
-	irq = mpic_cpu_read(MPIC_CPU_INTACK) & MPIC_VECPRI_VECTOR_MASK;
+	irq = mpic_cpu_read(mpic->hw_set->cpu_intack) & mpic->hw_set->irq_vpr_vector;
 #ifdef DEBUG_LOW
 	DBG("%s: get_one_irq(): %d\n", mpic->name, irq);
 #endif
@@ -972,11 +1052,18 @@ #ifdef DEBUG_LOW
 		DBG("%s: cascading ...\n", mpic->name);
 #endif
 		irq = mpic->cascade(regs, mpic->cascade_data);
-		mpic_eoi(mpic);
+#ifdef DEBUG_LOW
+		DBG("%s: cascaded irq: %d\n", mpic->name, irq);
+#endif
+		if (!(mpic->flags & MPIC_CASC_NOEOI))
+			mpic_eoi(mpic);
 		return irq;
 	}
-	if (unlikely(irq == MPIC_VEC_SPURRIOUS))
+	if (unlikely(irq == MPIC_VEC_SPURRIOUS)) {
+		if (mpic->flags & MPIC_SPV_EOI)
+			mpic_eoi(mpic);
 		return -1;
+	}
 	if (irq < MPIC_VEC_IPI_0) {
 #ifdef DEBUG_IRQ
 		DBG("%s: irq %d\n", mpic->name, irq + mpic->irq_offset);
diff --git a/include/asm-powerpc/mpic.h b/include/asm-powerpc/mpic.h
index 6b9e781..6fa3427 100644
--- a/include/asm-powerpc/mpic.h
+++ b/include/asm-powerpc/mpic.h
@@ -37,6 +37,7 @@ #define MPIC_GREG_IPI_VECTOR_PRI_0	0x000
 #define MPIC_GREG_IPI_VECTOR_PRI_1	0x000b0
 #define MPIC_GREG_IPI_VECTOR_PRI_2	0x000c0
 #define MPIC_GREG_IPI_VECTOR_PRI_3	0x000d0
+#define MPIC_GREG_IPI_STRIDE		0x10
 #define MPIC_GREG_SPURIOUS		0x000e0
 #define MPIC_GREG_TIMER_FREQ		0x000f0
 
@@ -64,6 +65,7 @@ #define MPIC_CPU_IPI_DISPATCH_0		0x00040
 #define MPIC_CPU_IPI_DISPATCH_1		0x00050
 #define MPIC_CPU_IPI_DISPATCH_2		0x00060
 #define MPIC_CPU_IPI_DISPATCH_3		0x00070
+#define MPIC_CPU_IPI_DISPATCH_STRIDE	0x00010
 #define MPIC_CPU_CURRENT_TASK_PRI	0x00080
 #define 	MPIC_CPU_TASKPRI_MASK			0x0000000f
 #define MPIC_CPU_WHOAMI			0x00090
@@ -91,6 +93,55 @@ #define 	MPIC_VECPRI_SENSE_EDGE			0x0000
 #define 	MPIC_VECPRI_SENSE_MASK			0x00400000
 #define MPIC_IRQ_DESTINATION		0x00010
 
+/******************************************************************************
+ * Tsi108 implementation of MPIC has many differences form the original one
+ */
+
+/*
+ * Global registers
+ */
+
+#define TSI108_GREG_BASE		0x00000
+#define TSI108_GREG_FEATURE_0		0x00000
+#define TSI108_GREG_GLOBAL_CONF_0	0x00004
+#define TSI108_GREG_VENDOR_ID		0x0000c
+#define TSI108_GREG_IPI_VECTOR_PRI_0	0x00204		/* Doorbell 0 */
+#define TSI108_GREG_IPI_STRIDE		0x0c
+#define TSI108_GREG_SPURIOUS		0x00010
+#define TSI108_GREG_TIMER_FREQ		0x00014
+
+/*
+ * Timer registers
+ */
+#define TSI108_TIMER_BASE		0x0030
+#define TSI108_TIMER_STRIDE		0x10
+#define TSI108_TIMER_CURRENT_CNT	0x00000
+#define TSI108_TIMER_BASE_CNT		0x00004
+#define TSI108_TIMER_VECTOR_PRI		0x00008
+#define TSI108_TIMER_DESTINATION	0x0000c
+
+/*
+ * Per-Processor registers
+ */
+#define TSI108_CPU_BASE			0x00300
+#define TSI108_CPU_STRIDE		0x00040
+#define TSI108_CPU_IPI_DISPATCH_0	0x00200
+#define TSI108_CPU_IPI_DISPATCH_STRIDE	0x00000
+#define TSI108_CPU_CURRENT_TASK_PRI	0x00000
+#define TSI108_CPU_INTACK		0x00004
+#define TSI108_CPU_EOI			0x00008
+
+/*
+ * Per-source registers
+ */
+#define TSI108_IRQ_REG_BASE		0x00100
+#define TSI108_IRQ_STRIDE		0x00008
+#define TSI108_IRQ_VECTOR_PRI		0x00000
+#define 	TSI108_VECPRI_VECTOR_MASK		0x000000ff
+#define 	TSI108_VECPRI_POLARITY_POSITIVE		0x01000000
+#define 	TSI108_VECPRI_SENSE_LEVEL		0x02000000
+#define TSI108_IRQ_DESTINATION		0x00004
+
 #define MPIC_MAX_IRQ_SOURCES	2048
 #define MPIC_MAX_CPUS		32
 #define MPIC_MAX_ISU		32
@@ -124,6 +175,40 @@ struct mpic_irq_fixup
 };
 #endif /* CONFIG_MPIC_BROKEN_U3 */
 
+struct mpic_info {
+	u32	greg_base;	/* offset of global registers from MPIC base */
+	u32	greg_frr0;	/* FRR0 offset from base */
+	u32	greg_config0;	/* Global Config register offset from base */
+	u32	greg_vendor_id;	/* VID register offset from base */
+	u32	greg_ipi_vp0;	/* IPI Vector/Priority Registers */
+	u32	greg_ipi_stride; /* IPI Vector/Priority Registers spacing */
+	u32	greg_spurious;	/* Spurious Vector Register */
+	u32	greg_tfrr;	/* Global Timer Frequency Reporting Register */
+
+	u32	timer_base;	/* Global Timer Registers base */
+	u32	timer_stride;	/* Global Timer Registers spacing */
+	u32	timer_ccr;	/* Global Timer Current Count Register */
+	u32	timer_bcr;	/* Global Timer Base Count Register */
+	u32	timer_vpr;	/* Global Timer Vector/Priority Register */
+	u32	timer_dest;	/* Global Timer Destination Register */
+
+	u32	cpu_base;	/* Global Timer Destination Register */
+	u32	cpu_stride;	/* Global Timer Destination Register */
+	u32	cpu_ipi_disp0;	/* IPI 0 Dispatch Command Register */
+	u32	cpu_ipi_disp_stride;	/* IPI Dispatch spacing */
+	u32	cpu_task_pri;	/* Processor Current Task Priority Register */
+	u32	cpu_whoami;	/* Who Am I Register */
+	u32	cpu_intack;	/* Interrupt Acknowledge Register */
+	u32	cpu_eoi;	/* End of Interrupt Register */
+
+	u32	irq_base;	/* Interrupt registers base */
+	u32	irq_stride;	/* Interrupt registers spacing */
+	u32	irq_vpr;	/* Interrupt Vector/Priority Register */
+	u32	irq_vpr_vector;	/* Interrupt Vector Mask */
+	u32	irq_vpr_polpos;	/* Interrupt Positive Polarity bit */
+	u32	irq_vpr_senlvl;	/* Interrupt Level Sense bit */
+	u32	irq_dest;	/* Interrupt Destination Register */
+};
 
 /* The instance data of a given MPIC */
 struct mpic
@@ -168,6 +253,8 @@ #endif
 	volatile u32 __iomem	*tmregs;
 	volatile u32 __iomem	*cpuregs[MPIC_MAX_CPUS];
 	volatile u32 __iomem	*isus[MPIC_MAX_ISU];
+	/* Pointer to HW info structure */
+	struct mpic_info	*hw_set;
 
 	/* link */
 	struct mpic		*next;
@@ -186,6 +273,16 @@ #define MPIC_BROKEN_U3			0x00000004
 #define MPIC_BROKEN_IPI			0x00000008
 /* MPIC wants a reset */
 #define MPIC_WANTS_RESET		0x00000010
+/* Spurious vector requires EOI */
+#define MPIC_SPV_EOI			0x00000020
+/* No EOI for cascaded interrupt */
+#define MPIC_CASC_NOEOI			0x00000040
+/* MPIC HW modification ID */
+#define MPIC_MOD_ID_MASK		0x00000f00
+#define MPIC_MOD_ID(val)		(((val) << 8) & MPIC_MOD_ID_MASK)
+#define MPIC_GET_MOD_ID(flags)		(((flags) & MPIC_MOD_ID_MASK) >> 8)
+#define		MPIC_ID_MPIC		0	/* Original MPIC */
+#define		MPIC_ID_TSI108		1	/* Tsi108/109 PIC */
 
 /* Allocate the controller structure and setup the linux irq descs
  * for the range if interrupts passed in. No HW initialization is

^ permalink raw reply related

* Re: snd-aoa: using feature calls for GPIOs
From: Benjamin Herrenschmidt @ 2006-06-06  9:15 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linuxppc-dev list
In-Reply-To: <1149578692.5928.6.camel@johannes.berg>

On Tue, 2006-06-06 at 09:24 +0200, Johannes Berg wrote:
> Here's a new patch. This one works for me if I invert the polarity of
> the hw_reset GPIO, but that doesn't seem right to me. But maybe it
> doesn't matter because the mini comes with explicit polarity indicators
> in the device tree and we can see what happens there? I'll implement the
> toonie codec in a bit.

I would say don't expect polarities to be any use when pmf are
available. That's why you get those explicit polarity infos on earlier
models. It's generally expected for a reset line to be active low (you
"assert" a reset by pulling it low to 0 and it's normally an open
collector with a pull-up to 5v. That's also why in general, reset lines
are never set to "1" but to "tristate" (or input mode for a GPIO) when
"released" but just do what darwin does there.

Ben.


> johannes
> 
> --- snd-aoa.orig/aoa.h	2006-06-06 09:21:44.341828919 +0200
> +++ snd-aoa/aoa.h	2006-06-06 09:22:08.341828919 +0200
> @@ -125,6 +125,7 @@ extern int aoa_snd_ctl_add(struct snd_kc
>  
>  /* GPIO stuff */
>  extern struct gpio_methods *pmf_gpio_methods;
> +extern struct gpio_methods *ftr_gpio_methods;
>  /* extern struct gpio_methods *map_gpio_methods; */
>  
>  #endif /* __AOA_H */
> --- snd-aoa.orig/core/Makefile	2006-06-06 09:21:44.481828919 +0200
> +++ snd-aoa/core/Makefile	2006-06-06 09:22:08.541828919 +0200
> @@ -1,4 +1,5 @@
>  obj-$(CONFIG_SND_AOA) += snd-aoa.o
>  snd-aoa-objs := snd-aoa-core.o \
>  		snd-aoa-alsa.o \
> -		snd-aoa-gpio-pmf.o
> +		snd-aoa-gpio-pmf.o \
> +		snd-aoa-gpio-feature.o
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ snd-aoa/core/snd-aoa-gpio-feature.c	2006-06-06 09:22:08.581828919 +0200
> @@ -0,0 +1,339 @@
> +/*
> + * Apple Onboard Audio feature call GPIO control
> + *
> + * Copyright 2006 Johannes Berg <johannes@sipsolutions.net>
> + *
> + * GPL v2, can be found in COPYING.
> + */
> +
> +#include <asm/pmac_feature.h>
> +#include <linux/interrupt.h>
> +#include "../aoa.h"
> +
> +static int headphone_mute_gpio;
> +static int amp_mute_gpio;
> +static int lineout_mute_gpio;
> +static int hw_reset_gpio;
> +static int lineout_detect_gpio;
> +static int headphone_detect_gpio;
> +static int linein_detect_gpio;
> +
> +static int headphone_mute_gpio_activestate;
> +static int amp_mute_gpio_activestate;
> +static int lineout_mute_gpio_activestate;
> +static int hw_reset_gpio_activestate;
> +static int lineout_detect_gpio_activestate;
> +static int headphone_detect_gpio_activestate;
> +static int linein_detect_gpio_activestate;
> +
> +static int lineout_detect_irq;
> +static int linein_detect_irq;
> +static int headphone_detect_irq;
> +
> +static void get_gpio(char *name, int *gpioptr, int *gpioactiveptr)
> +{
> +	struct device_node *np;
> +	u32 *reg;
> +
> +	*gpioptr = -1;
> +
> +	np = of_find_node_by_name(NULL, name);
> +	if (!np)
> +		return;
> +
> +	reg = (u32 *)get_property(np, "reg", NULL);
> +	if (!reg)
> +		return;
> +
> +	*gpioptr = *reg;
> +
> +	/* this is a hack, usually the GPIOs 'reg' property
> +	 * should have the offset based from the GPIO space
> +	 * which is at 0x50, but apparently not always... */
> +	if (*gpioptr < 0x50)
> +		*gpioptr += 0x50;
> +
> +	reg = (u32 *)get_property(np, "audio-gpio-active-state", NULL);
> +	if (!reg)
> +		*gpioactiveptr = 1;
> +	else
> +		*gpioactiveptr = *reg;
> +
> +	printk(KERN_DEBUG "gpio %s = %d (active = %d)\n", name, *gpioptr, *gpioactiveptr);
> +}
> +
> +static void get_irq(char *name, int *irqptr)
> +{
> +	struct device_node *np;
> +
> +	*irqptr = -1;
> +	np = of_find_node_by_name(NULL, name);
> +	if (!np)
> +		return;
> +	if (np->n_intrs != 1)
> +		return;
> +	*irqptr = np->intrs[0].line;
> +
> +	printk(KERN_DEBUG "got %s irq = %d\n", name, *irqptr);
> +}
> +
> +#define SWITCH_GPIO(name, v, on)	\
> +	(((v)&~1) | ((on)?(name##_gpio_activestate==0?4:5):(name##_gpio_activestate==0?5:4)))
> +
> +#define FTR_GPIO(name, bit)					\
> +static void ftr_gpio_set_##name(struct gpio_runtime *rt, int on)\
> +{								\
> +	int v;							\
> +								\
> +	if (unlikely(!rt)) return;				\
> +								\
> +	if (name##_mute_gpio < 0)				\
> +		return;						\
> +								\
> +	v = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL,		\
> +			  name##_mute_gpio,			\
> +			  0);					\
> +	printk(KERN_DEBUG "gpio " #name " is %x\n", v);		\
> +								\
> +	v = SWITCH_GPIO(name##_mute, v, on);			\
> +	printk(KERN_DEBUG "writing %x to " #name " gpio\n", v);	\
> +								\
> +	pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL,		\
> +			  name##_mute_gpio,			\
> +			  SWITCH_GPIO(name##_mute, v, on));	\
> +								\
> +	rt->implementation_private &= ~(1<<bit);		\
> +	rt->implementation_private |= (!!on << bit);		\
> +}								\
> +static int ftr_gpio_get_##name(struct gpio_runtime *rt)		\
> +{								\
> +	if (unlikely(!rt)) return 0;				\
> +	return (rt->implementation_private>>bit)&1;		\
> +}
> +
> +FTR_GPIO(headphone, 0);
> +FTR_GPIO(amp, 1);
> +FTR_GPIO(lineout, 2);
> +
> +static void ftr_gpio_set_hw_reset(struct gpio_runtime *rt, int on)
> +{
> +	int v;
> +
> +	if (unlikely(!rt)) return;
> +	if (hw_reset_gpio < 0)
> +		return;
> +
> +	v = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL,
> +			  hw_reset_gpio, 0);
> +	printk(KERN_DEBUG "hw_reset gpio is %x\n", v);
> +	v = SWITCH_GPIO(hw_reset, v, on);
> +	printk(KERN_DEBUG "writing %x to hw_reset gpio\n", v);
> +	pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL,
> +			  hw_reset_gpio, v);
> +}
> +
> +static void ftr_gpio_all_amps_off(struct gpio_runtime *rt)
> +{
> +	int saved;
> +
> +	if (unlikely(!rt)) return;
> +	saved = rt->implementation_private;
> +	ftr_gpio_set_headphone(rt, 0);
> +	ftr_gpio_set_amp(rt, 0);
> +	ftr_gpio_set_lineout(rt, 0);
> +	rt->implementation_private = saved;
> +}
> +
> +static void ftr_gpio_all_amps_restore(struct gpio_runtime *rt)
> +{
> +	int s;
> +
> +	if (unlikely(!rt)) return;
> +	s = rt->implementation_private;
> +	ftr_gpio_set_headphone(rt, (s>>0)&1);
> +	ftr_gpio_set_amp(rt, (s>>1)&1);
> +	ftr_gpio_set_lineout(rt, (s>>2)&1);
> +}
> +
> +static void ftr_handle_notify(void *data)
> +{
> +	struct gpio_notification *notif = data;
> +
> +	mutex_lock(&notif->mutex);
> +	if (notif->notify)
> +		notif->notify(notif->data);
> +	mutex_unlock(&notif->mutex);
> +}
> +
> +static void ftr_gpio_init(struct gpio_runtime *rt)
> +{
> +	get_gpio("headphone-mute", &headphone_mute_gpio,
> +				   &headphone_mute_gpio_activestate);
> +	get_gpio("amp-mute", &amp_mute_gpio,
> +			     &amp_mute_gpio_activestate);
> +	get_gpio("lineout-mute", &lineout_mute_gpio,
> +				 &lineout_mute_gpio_activestate);
> +	get_gpio("hw-reset", &hw_reset_gpio,
> +			     &hw_reset_gpio_activestate);
> +	get_gpio("headphone-detect", &headphone_detect_gpio,
> +				     &headphone_detect_gpio_activestate);
> +	get_gpio("lineout-detect", &lineout_detect_gpio,
> +				   &lineout_detect_gpio_activestate);
> +	get_gpio("linein-detect", &linein_detect_gpio,
> +				  &linein_detect_gpio_activestate);
> +
> +	get_irq("headphone-detect", &headphone_detect_irq);
> +	get_irq("lineout-detect", &lineout_detect_irq);
> +	get_irq("linein-detect", &linein_detect_irq);
> +
> +	ftr_gpio_all_amps_off(rt);
> +	rt->implementation_private = 0;
> +	INIT_WORK(&rt->headphone_notify.work, ftr_handle_notify, &rt->headphone_notify);
> +	INIT_WORK(&rt->line_in_notify.work, ftr_handle_notify, &rt->line_in_notify);
> +	INIT_WORK(&rt->line_out_notify.work, ftr_handle_notify, &rt->line_out_notify);
> +	mutex_init(&rt->headphone_notify.mutex);
> +	mutex_init(&rt->line_in_notify.mutex);
> +	mutex_init(&rt->line_out_notify.mutex);
> +}
> +
> +static void ftr_gpio_exit(struct gpio_runtime *rt)
> +{
> +	ftr_gpio_all_amps_off(rt);
> +	rt->implementation_private = 0;
> +	if (rt->headphone_notify.notify)
> +		free_irq(headphone_detect_irq, &rt->headphone_notify);
> +	if (rt->line_in_notify.gpio_private)
> +		free_irq(linein_detect_irq, &rt->line_in_notify);
> +	if (rt->line_out_notify.gpio_private)
> +		free_irq(lineout_detect_irq, &rt->line_out_notify);
> +	cancel_delayed_work(&rt->headphone_notify.work);
> +	cancel_delayed_work(&rt->line_in_notify.work);
> +	cancel_delayed_work(&rt->line_out_notify.work);
> +	flush_scheduled_work();
> +	mutex_destroy(&rt->headphone_notify.mutex);
> +	mutex_destroy(&rt->line_in_notify.mutex);
> +	mutex_destroy(&rt->line_out_notify.mutex);
> +}
> +
> +irqreturn_t ftr_handle_notify_irq(int xx, void *data, struct pt_regs *regs)
> +{
> +	struct gpio_notification *notif = data;
> +
> +	schedule_work(&notif->work);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int ftr_set_notify(struct gpio_runtime *rt,
> +			  enum notify_type type,
> +			  notify_func_t notify,
> +			  void *data)
> +{
> +	struct gpio_notification *notif;
> +	notify_func_t old;
> +	int irq;
> +	char *name;
> +	int err = -EBUSY;
> +
> +	switch (type) {
> +	case AOA_NOTIFY_HEADPHONE:
> +		notif = &rt->headphone_notify;
> +		name = "headphone-detect";
> +		irq = headphone_detect_irq;
> +		break;
> +	case AOA_NOTIFY_LINE_IN:
> +		notif = &rt->line_in_notify;
> +		name = "linein-detect";
> +		irq = linein_detect_irq;
> +		break;
> +	case AOA_NOTIFY_LINE_OUT:
> +		notif = &rt->line_out_notify;
> +		name = "lineout-detect";
> +		irq = lineout_detect_irq;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (irq == -1)
> +		return -ENODEV;
> +
> +	mutex_lock(&notif->mutex);
> +
> +	old = notif->notify;
> +
> +	if (!old && !notify) {
> +		err = 0;
> +		goto out_unlock;
> +	}
> +
> +	if (old && notify) {
> +		if (old == notify && notif->data == data)
> +			err = 0;
> +		goto out_unlock;
> +	}
> +
> +	if (old && !notify) {
> +		free_irq(irq, notif);
> +	}
> +	if (!old && notify) {
> +		request_irq(irq, ftr_handle_notify_irq, 0, name, notif);
> +	}
> +	notif->notify = notify;
> +	notif->data = data;
> +
> +	err = 0;
> + out_unlock:
> +	mutex_unlock(&notif->mutex);
> +	return err;
> +}
> +
> +static int ftr_get_detect(struct gpio_runtime *rt,
> +			  enum notify_type type)
> +{
> +	int gpio, ret, active;
> +
> +	switch (type) {
> +	case AOA_NOTIFY_HEADPHONE:
> +		gpio = headphone_detect_gpio;
> +		active = headphone_detect_gpio_activestate;
> +		break;
> +	case AOA_NOTIFY_LINE_IN:
> +		gpio = linein_detect_gpio;
> +		active = linein_detect_gpio_activestate;
> +		break;
> +	case AOA_NOTIFY_LINE_OUT:
> +		gpio = lineout_detect_gpio;
> +		active = lineout_detect_gpio_activestate;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (gpio == -1)
> +		return -ENODEV;
> +
> +	ret = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, gpio, 0);
> +	if (ret < 0)
> +		return ret;
> +	return ((ret >> 1) & 1) == active;
> +}
> +
> +static struct gpio_methods methods = {
> +	.init			= ftr_gpio_init,
> +	.exit			= ftr_gpio_exit,
> +	.all_amps_off		= ftr_gpio_all_amps_off,
> +	.all_amps_restore	= ftr_gpio_all_amps_restore,
> +	.set_headphone		= ftr_gpio_set_headphone,
> +	.set_speakers		= ftr_gpio_set_amp,
> +	.set_lineout		= ftr_gpio_set_lineout,
> +	.set_hw_reset		= ftr_gpio_set_hw_reset,
> +	.get_headphone		= ftr_gpio_get_headphone,
> +	.get_speakers		= ftr_gpio_get_amp,
> +	.get_lineout		= ftr_gpio_get_lineout,
> +	.set_notify		= ftr_set_notify,
> +	.get_detect		= ftr_get_detect,
> +};
> +
> +struct gpio_methods *ftr_gpio_methods = &methods;
> +EXPORT_SYMBOL_GPL(ftr_gpio_methods);
> --- snd-aoa.orig/fabrics/snd-aoa-fabric-layout.c	2006-06-06 09:22:13.251828919 +0200
> +++ snd-aoa/fabrics/snd-aoa-fabric-layout.c	2006-06-06 09:22:17.051828919 +0200
> @@ -749,7 +749,11 @@ static int aoa_fabric_layout_probe(struc
>  	ldev->sound = sound;
>  	ldev->layout = layout;
>  	ldev->gpio.node = sound->parent;
> -	ldev->gpio.methods = pmf_gpio_methods;
> +	if (layout->layout_id == 58)
> +		/* only on the Mac Mini ... */
> +		ldev->gpio.methods = ftr_gpio_methods;
> +	else
> +		ldev->gpio.methods = pmf_gpio_methods;
>  	ldev->selfptr_headphone.ptr = ldev;
>  	ldev->selfptr_lineout.ptr = ldev;
>  	sdev->ofdev.dev.driver_data = ldev;
> 

^ permalink raw reply

* Using RS232 Terminal as input device
From: hbruegge @ 2006-06-06  8:08 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

Hi all


I am using the kernel (2.6.15) in an ppc-based embedded system through a 
serial console on ttyS0 with no problems.

Now I attached a graphic card to the system, which is correctly 
recognized and initialised by the Kernel. If I add "console=tty0 
console=ttyS0,57600" to the bootargs, I have the boot messages of the 
kernel on the screen and a login on the ttyS0.

What I now want to do is to disable the serial console and use the 
graphic card as output and the raw character data (ASCII) coming in from 
the terminal connected to the rs232 as input.

I have a login on the screen, if I delete the line console=ttyS0,57600 
from the bootargs, but unfortunately no input if a key is pressed on the 
terminal keyboard.


So my problem and question is, apart from not totally understanding the 
input device system of 2.6.x ;-)
Has anyone ever done that?
Is it just a kernel configuration problem? If so what modules have to be 
compiled in?
Are there any good documentaion about the new input device system ?

Basically I am totally confused ;-), so any ideas are appreciated.

Cheers

Harald

[-- Attachment #2: Type: text/html, Size: 1411 bytes --]

^ permalink raw reply

* eth0: tx queue full
From: salvatore cusenza @ 2006-06-06  8:13 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]

At runtime during the usual life of my board (MPC852 and linux-2.4.20 Denk's
distribution)
 I have experienced the following crash:


eth0: tx queue full!.
eth0: tx queue full!.
eth0: tx queue full!.

Oops: kernel access of bad area, sig: 11
NIP: C000D440 XER: 00000000 LR: C00BB040 SP: C0C9BC10 REGS: c0c9bb60 TRAP:
0300    Tainted: P
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 00001F9D, DSISR: 000000E4
TASK = c0c9a000[145] 'L5421' Last syscall: 4
last math 00000000 last altivec 00000000
GPR00: 00000000 C0C9BC10 C0C9A000 C0F56D70 00001F99 0000003C C0F56D6C
00000007
GPR08: 00000001 0000003C 00000000 C0F56DB0 C0D83C3C 10071D28 00000000
C3120000
GPR16: C311CB04 C311C8D8 C311C754 C0170000 C3120000 C311CB30 00000001
C0169DA0
GPR24: F0000E00 00001F9D C0F58400 0000003C 00000040 C2080100 C0F58200
C0F501B0
Call backtrace:
C00BAF8C C00BABC8 C0005848 C3119448 C31194C0 C31194F8 C00066F8
C0011A48 C00BA8FC C00CADD8 C00C3F00 C0016B50 C00AFE4C C00B4B94
C00B5EF4 C003571C C000457C 0FFD5E4C 0FEDB8DC 0FEDB284 1003B5B8
1003D558 1003A88C 0FED34A4 0FED32D0 0FFCFEE4 0FD5F590
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 <0>Rebooting in 180 seconds..


Could you suggest me something to investigate?

[-- Attachment #2: Type: text/html, Size: 1394 bytes --]

^ permalink raw reply

* Re: snd-aoa: using feature calls for GPIOs
From: Johannes Berg @ 2006-06-06  7:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list
In-Reply-To: <1149477113.8543.1.camel@localhost.localdomain>

Here's a new patch. This one works for me if I invert the polarity of
the hw_reset GPIO, but that doesn't seem right to me. But maybe it
doesn't matter because the mini comes with explicit polarity indicators
in the device tree and we can see what happens there? I'll implement the
toonie codec in a bit.

johannes

--- snd-aoa.orig/aoa.h	2006-06-06 09:21:44.341828919 +0200
+++ snd-aoa/aoa.h	2006-06-06 09:22:08.341828919 +0200
@@ -125,6 +125,7 @@ extern int aoa_snd_ctl_add(struct snd_kc
 
 /* GPIO stuff */
 extern struct gpio_methods *pmf_gpio_methods;
+extern struct gpio_methods *ftr_gpio_methods;
 /* extern struct gpio_methods *map_gpio_methods; */
 
 #endif /* __AOA_H */
--- snd-aoa.orig/core/Makefile	2006-06-06 09:21:44.481828919 +0200
+++ snd-aoa/core/Makefile	2006-06-06 09:22:08.541828919 +0200
@@ -1,4 +1,5 @@
 obj-$(CONFIG_SND_AOA) += snd-aoa.o
 snd-aoa-objs := snd-aoa-core.o \
 		snd-aoa-alsa.o \
-		snd-aoa-gpio-pmf.o
+		snd-aoa-gpio-pmf.o \
+		snd-aoa-gpio-feature.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ snd-aoa/core/snd-aoa-gpio-feature.c	2006-06-06 09:22:08.581828919 +0200
@@ -0,0 +1,339 @@
+/*
+ * Apple Onboard Audio feature call GPIO control
+ *
+ * Copyright 2006 Johannes Berg <johannes@sipsolutions.net>
+ *
+ * GPL v2, can be found in COPYING.
+ */
+
+#include <asm/pmac_feature.h>
+#include <linux/interrupt.h>
+#include "../aoa.h"
+
+static int headphone_mute_gpio;
+static int amp_mute_gpio;
+static int lineout_mute_gpio;
+static int hw_reset_gpio;
+static int lineout_detect_gpio;
+static int headphone_detect_gpio;
+static int linein_detect_gpio;
+
+static int headphone_mute_gpio_activestate;
+static int amp_mute_gpio_activestate;
+static int lineout_mute_gpio_activestate;
+static int hw_reset_gpio_activestate;
+static int lineout_detect_gpio_activestate;
+static int headphone_detect_gpio_activestate;
+static int linein_detect_gpio_activestate;
+
+static int lineout_detect_irq;
+static int linein_detect_irq;
+static int headphone_detect_irq;
+
+static void get_gpio(char *name, int *gpioptr, int *gpioactiveptr)
+{
+	struct device_node *np;
+	u32 *reg;
+
+	*gpioptr = -1;
+
+	np = of_find_node_by_name(NULL, name);
+	if (!np)
+		return;
+
+	reg = (u32 *)get_property(np, "reg", NULL);
+	if (!reg)
+		return;
+
+	*gpioptr = *reg;
+
+	/* this is a hack, usually the GPIOs 'reg' property
+	 * should have the offset based from the GPIO space
+	 * which is at 0x50, but apparently not always... */
+	if (*gpioptr < 0x50)
+		*gpioptr += 0x50;
+
+	reg = (u32 *)get_property(np, "audio-gpio-active-state", NULL);
+	if (!reg)
+		*gpioactiveptr = 1;
+	else
+		*gpioactiveptr = *reg;
+
+	printk(KERN_DEBUG "gpio %s = %d (active = %d)\n", name, *gpioptr, *gpioactiveptr);
+}
+
+static void get_irq(char *name, int *irqptr)
+{
+	struct device_node *np;
+
+	*irqptr = -1;
+	np = of_find_node_by_name(NULL, name);
+	if (!np)
+		return;
+	if (np->n_intrs != 1)
+		return;
+	*irqptr = np->intrs[0].line;
+
+	printk(KERN_DEBUG "got %s irq = %d\n", name, *irqptr);
+}
+
+#define SWITCH_GPIO(name, v, on)	\
+	(((v)&~1) | ((on)?(name##_gpio_activestate==0?4:5):(name##_gpio_activestate==0?5:4)))
+
+#define FTR_GPIO(name, bit)					\
+static void ftr_gpio_set_##name(struct gpio_runtime *rt, int on)\
+{								\
+	int v;							\
+								\
+	if (unlikely(!rt)) return;				\
+								\
+	if (name##_mute_gpio < 0)				\
+		return;						\
+								\
+	v = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL,		\
+			  name##_mute_gpio,			\
+			  0);					\
+	printk(KERN_DEBUG "gpio " #name " is %x\n", v);		\
+								\
+	v = SWITCH_GPIO(name##_mute, v, on);			\
+	printk(KERN_DEBUG "writing %x to " #name " gpio\n", v);	\
+								\
+	pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL,		\
+			  name##_mute_gpio,			\
+			  SWITCH_GPIO(name##_mute, v, on));	\
+								\
+	rt->implementation_private &= ~(1<<bit);		\
+	rt->implementation_private |= (!!on << bit);		\
+}								\
+static int ftr_gpio_get_##name(struct gpio_runtime *rt)		\
+{								\
+	if (unlikely(!rt)) return 0;				\
+	return (rt->implementation_private>>bit)&1;		\
+}
+
+FTR_GPIO(headphone, 0);
+FTR_GPIO(amp, 1);
+FTR_GPIO(lineout, 2);
+
+static void ftr_gpio_set_hw_reset(struct gpio_runtime *rt, int on)
+{
+	int v;
+
+	if (unlikely(!rt)) return;
+	if (hw_reset_gpio < 0)
+		return;
+
+	v = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL,
+			  hw_reset_gpio, 0);
+	printk(KERN_DEBUG "hw_reset gpio is %x\n", v);
+	v = SWITCH_GPIO(hw_reset, v, on);
+	printk(KERN_DEBUG "writing %x to hw_reset gpio\n", v);
+	pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL,
+			  hw_reset_gpio, v);
+}
+
+static void ftr_gpio_all_amps_off(struct gpio_runtime *rt)
+{
+	int saved;
+
+	if (unlikely(!rt)) return;
+	saved = rt->implementation_private;
+	ftr_gpio_set_headphone(rt, 0);
+	ftr_gpio_set_amp(rt, 0);
+	ftr_gpio_set_lineout(rt, 0);
+	rt->implementation_private = saved;
+}
+
+static void ftr_gpio_all_amps_restore(struct gpio_runtime *rt)
+{
+	int s;
+
+	if (unlikely(!rt)) return;
+	s = rt->implementation_private;
+	ftr_gpio_set_headphone(rt, (s>>0)&1);
+	ftr_gpio_set_amp(rt, (s>>1)&1);
+	ftr_gpio_set_lineout(rt, (s>>2)&1);
+}
+
+static void ftr_handle_notify(void *data)
+{
+	struct gpio_notification *notif = data;
+
+	mutex_lock(&notif->mutex);
+	if (notif->notify)
+		notif->notify(notif->data);
+	mutex_unlock(&notif->mutex);
+}
+
+static void ftr_gpio_init(struct gpio_runtime *rt)
+{
+	get_gpio("headphone-mute", &headphone_mute_gpio,
+				   &headphone_mute_gpio_activestate);
+	get_gpio("amp-mute", &amp_mute_gpio,
+			     &amp_mute_gpio_activestate);
+	get_gpio("lineout-mute", &lineout_mute_gpio,
+				 &lineout_mute_gpio_activestate);
+	get_gpio("hw-reset", &hw_reset_gpio,
+			     &hw_reset_gpio_activestate);
+	get_gpio("headphone-detect", &headphone_detect_gpio,
+				     &headphone_detect_gpio_activestate);
+	get_gpio("lineout-detect", &lineout_detect_gpio,
+				   &lineout_detect_gpio_activestate);
+	get_gpio("linein-detect", &linein_detect_gpio,
+				  &linein_detect_gpio_activestate);
+
+	get_irq("headphone-detect", &headphone_detect_irq);
+	get_irq("lineout-detect", &lineout_detect_irq);
+	get_irq("linein-detect", &linein_detect_irq);
+
+	ftr_gpio_all_amps_off(rt);
+	rt->implementation_private = 0;
+	INIT_WORK(&rt->headphone_notify.work, ftr_handle_notify, &rt->headphone_notify);
+	INIT_WORK(&rt->line_in_notify.work, ftr_handle_notify, &rt->line_in_notify);
+	INIT_WORK(&rt->line_out_notify.work, ftr_handle_notify, &rt->line_out_notify);
+	mutex_init(&rt->headphone_notify.mutex);
+	mutex_init(&rt->line_in_notify.mutex);
+	mutex_init(&rt->line_out_notify.mutex);
+}
+
+static void ftr_gpio_exit(struct gpio_runtime *rt)
+{
+	ftr_gpio_all_amps_off(rt);
+	rt->implementation_private = 0;
+	if (rt->headphone_notify.notify)
+		free_irq(headphone_detect_irq, &rt->headphone_notify);
+	if (rt->line_in_notify.gpio_private)
+		free_irq(linein_detect_irq, &rt->line_in_notify);
+	if (rt->line_out_notify.gpio_private)
+		free_irq(lineout_detect_irq, &rt->line_out_notify);
+	cancel_delayed_work(&rt->headphone_notify.work);
+	cancel_delayed_work(&rt->line_in_notify.work);
+	cancel_delayed_work(&rt->line_out_notify.work);
+	flush_scheduled_work();
+	mutex_destroy(&rt->headphone_notify.mutex);
+	mutex_destroy(&rt->line_in_notify.mutex);
+	mutex_destroy(&rt->line_out_notify.mutex);
+}
+
+irqreturn_t ftr_handle_notify_irq(int xx, void *data, struct pt_regs *regs)
+{
+	struct gpio_notification *notif = data;
+
+	schedule_work(&notif->work);
+
+	return IRQ_HANDLED;
+}
+
+static int ftr_set_notify(struct gpio_runtime *rt,
+			  enum notify_type type,
+			  notify_func_t notify,
+			  void *data)
+{
+	struct gpio_notification *notif;
+	notify_func_t old;
+	int irq;
+	char *name;
+	int err = -EBUSY;
+
+	switch (type) {
+	case AOA_NOTIFY_HEADPHONE:
+		notif = &rt->headphone_notify;
+		name = "headphone-detect";
+		irq = headphone_detect_irq;
+		break;
+	case AOA_NOTIFY_LINE_IN:
+		notif = &rt->line_in_notify;
+		name = "linein-detect";
+		irq = linein_detect_irq;
+		break;
+	case AOA_NOTIFY_LINE_OUT:
+		notif = &rt->line_out_notify;
+		name = "lineout-detect";
+		irq = lineout_detect_irq;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (irq == -1)
+		return -ENODEV;
+
+	mutex_lock(&notif->mutex);
+
+	old = notif->notify;
+
+	if (!old && !notify) {
+		err = 0;
+		goto out_unlock;
+	}
+
+	if (old && notify) {
+		if (old == notify && notif->data == data)
+			err = 0;
+		goto out_unlock;
+	}
+
+	if (old && !notify) {
+		free_irq(irq, notif);
+	}
+	if (!old && notify) {
+		request_irq(irq, ftr_handle_notify_irq, 0, name, notif);
+	}
+	notif->notify = notify;
+	notif->data = data;
+
+	err = 0;
+ out_unlock:
+	mutex_unlock(&notif->mutex);
+	return err;
+}
+
+static int ftr_get_detect(struct gpio_runtime *rt,
+			  enum notify_type type)
+{
+	int gpio, ret, active;
+
+	switch (type) {
+	case AOA_NOTIFY_HEADPHONE:
+		gpio = headphone_detect_gpio;
+		active = headphone_detect_gpio_activestate;
+		break;
+	case AOA_NOTIFY_LINE_IN:
+		gpio = linein_detect_gpio;
+		active = linein_detect_gpio_activestate;
+		break;
+	case AOA_NOTIFY_LINE_OUT:
+		gpio = lineout_detect_gpio;
+		active = lineout_detect_gpio_activestate;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (gpio == -1)
+		return -ENODEV;
+
+	ret = pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, gpio, 0);
+	if (ret < 0)
+		return ret;
+	return ((ret >> 1) & 1) == active;
+}
+
+static struct gpio_methods methods = {
+	.init			= ftr_gpio_init,
+	.exit			= ftr_gpio_exit,
+	.all_amps_off		= ftr_gpio_all_amps_off,
+	.all_amps_restore	= ftr_gpio_all_amps_restore,
+	.set_headphone		= ftr_gpio_set_headphone,
+	.set_speakers		= ftr_gpio_set_amp,
+	.set_lineout		= ftr_gpio_set_lineout,
+	.set_hw_reset		= ftr_gpio_set_hw_reset,
+	.get_headphone		= ftr_gpio_get_headphone,
+	.get_speakers		= ftr_gpio_get_amp,
+	.get_lineout		= ftr_gpio_get_lineout,
+	.set_notify		= ftr_set_notify,
+	.get_detect		= ftr_get_detect,
+};
+
+struct gpio_methods *ftr_gpio_methods = &methods;
+EXPORT_SYMBOL_GPL(ftr_gpio_methods);
--- snd-aoa.orig/fabrics/snd-aoa-fabric-layout.c	2006-06-06 09:22:13.251828919 +0200
+++ snd-aoa/fabrics/snd-aoa-fabric-layout.c	2006-06-06 09:22:17.051828919 +0200
@@ -749,7 +749,11 @@ static int aoa_fabric_layout_probe(struc
 	ldev->sound = sound;
 	ldev->layout = layout;
 	ldev->gpio.node = sound->parent;
-	ldev->gpio.methods = pmf_gpio_methods;
+	if (layout->layout_id == 58)
+		/* only on the Mac Mini ... */
+		ldev->gpio.methods = ftr_gpio_methods;
+	else
+		ldev->gpio.methods = pmf_gpio_methods;
 	ldev->selfptr_headphone.ptr = ldev;
 	ldev->selfptr_lineout.ptr = ldev;
 	sdev->ofdev.dev.driver_data = ldev;

^ permalink raw reply

* Re: Making Two ethernet interfaces up in Linux
From: Andy Fleming @ 2006-06-05 15:12 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: Shantanu Nalage, linuxppc-embedded
In-Reply-To: <200606032255.05997.antonio.dibacco@aruba.it>


On Jun 3, 2006, at 15:55, Antonio Di Bacco wrote:

>>          We are trying to port Linux on Xilinx Board XUPV2Pro  
>> which is
>> similar in most aspects to the Xilinx ML300 board. Linux is up and
>> running for the original board i.e. having only one ethrnet  
>> interface.
>> Now since we wanted to have the board working as router, we  
>> designed a
>> daughter board with two ethernet phy interfaces. The MACs required  
>> for
>> that are instantiated in Xilinx ....
>
> You have already the driver for the first MAC, then you should  
> start from that
> modifying the init procedure for example and all the others. Your  
> driver
> should initialize both the MACs and also create two devices calling
> init_etherdev tow times. If you post your driver I can suggest what to
> change. It is not so difficult.


Generally, it's better to modify the driver so it can be used to  
control any number of instances of the same ethernet hardware.  Then  
instantiate 2 (or how many you like) instances of the device, using  
the device layer (typically done in the board-specific code).

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox