LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: Clock running fast on Mac Mini?
From: Benjamin Herrenschmidt @ 2006-04-25  4:14 UTC (permalink / raw)
  To: Mich Lanners; +Cc: linuxppc-dev
In-Reply-To: <E1FY8kd-0000mH-PO@owl.grunz.lu>

On Mon, 2006-04-24 at 23:37 +0200, Mich Lanners wrote:
> Hello all,
> 
> I have a problem on my Mac Mini with the kernel clock, which is running
> fast.
> 
> In fact, it is gaining 10 ms every 5 seconds, as can be seen here, with
> ntpdate called every 5 seconds:
> 
> owl:~# while true; do ntpdate -u be.pool.ntp.org; sleep 5; done
> 24 Apr 23:14:17 ntpdate[2601]: adjust time server 195.47.215.226 offset 0.151197 sec
> 24 Apr 23:14:22 ntpdate[2607]: adjust time server 195.47.215.226 offset 0.160348 sec
> 24 Apr 23:14:28 ntpdate[2613]: adjust time server 195.47.215.226 offset 0.170311 sec
> [...]
> 24 Apr 23:17:15 ntpdate[2813]: adjust time server 195.47.215.226 offset 0.489635 sec
> 24 Apr 23:17:20 ntpdate[2819]: adjust time server 195.47.215.226 offset 0.499386 sec
> 24 Apr 23:17:25 ntpdate[2825]: step time server 195.47.215.226 offset 0.509342 sec
> 24 Apr 23:17:31 ntpdate[2831]: adjust time server 195.47.215.226 offset 0.013693 sec
> 24 Apr 23:17:36 ntpdate[2837]: adjust time server 195.47.215.226 offset 0.021668 sec
> 
> and so on.
> 
> This is with ntpd stopped, after a fresh boot with /etc/adjtime removed
> to make sure hwclock is not falsly tuning the kernel timekeeping (I did
> this because I once had a 2.6.16 kernel running, where clock problems
> have been reported. I thought that may have falsified ntp's drift file
> and/or hwclock's /etc/adjtime).
> 
> All this is with a 2.6.15 kernel.
> 
> My Alu powerbook with a 2.6.15 kernel does not show this problem.

Could be several things... the clock calibration data from the
device-tree incorrect, or a rouding bug we fixed recently... I'd expect
2.6.16 to be good.

Ben

^ permalink raw reply

* Re: [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V4
From: KAMEZAWA Hiroyuki @ 2006-04-25  1:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: davej, tony.luck, linux-mm, mel, ak, bob.picco, linux-kernel,
	linuxppc-dev
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>

On Mon, 24 Apr 2006 21:20:09 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:

> This is V4 of the patchset to size zones and memory holes in an
> architecture-independent manner.
> 

Could you add some documentation about 'how to use' your generic funcs ?
I think more archs can use your generic routine if well documented.

All initialization path can be written in following way ?
==
for_all_memory_region()
	add_active_range(nid, start, end)
free_area_init_nodes(max_dma, max_dma32, max_low_pfn, max_pfn);
==

And following functions are really needed ?
==
+extern void remove_all_active_ranges(void);
+extern void get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn);
+extern unsigned long find_min_pfn_with_active_regions(void);
+extern unsigned long find_max_pfn_with_active_regions(void);
+extern int early_pfn_to_nid(unsigned long pfn);
+extern void free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn);
+extern void sparse_memory_present_with_active_regions(int nid);
+extern unsigned long absent_pages_in_range(unsigned long start_pfn,
+						unsigned long end_pfn);

-Kame

^ permalink raw reply

* Re: [PATCH] Wire up *at syscalls
From: David Woodhouse @ 2006-04-25  0:13 UTC (permalink / raw)
  To: arnd; +Cc: linuxppc-dev
In-Reply-To: <jeu08ippb9.fsf@sykes.suse.de>


>   * please add new calls to arch/powerpc/platforms/cell/spu_callbacks.c 

Hm, do we really need that? How about something based on this instead...
note that I used CONFIG_PPC_CELL because dependencies on CONFIG_*_MODULE
in the static kernel are evil.

Syscall 224 is missing from the existing spu_syscall_table, btw.

diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 8d15226..8371f14 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -18,17 +18,29 @@ #include <linux/config.h>
 #include <asm/ppc_asm.h>
 
 #ifdef CONFIG_PPC64
-#define SYSCALL(func)		.llong	.sys_##func,.sys_##func
-#define COMPAT_SYS(func)	.llong	.sys_##func,.compat_sys_##func
-#define PPC_SYS(func)		.llong	.ppc_##func,.ppc_##func
-#define OLDSYS(func)		.llong	.sys_ni_syscall,.sys_ni_syscall
-#define SYS32ONLY(func)		.llong	.sys_ni_syscall,.compat_sys_##func
-#define SYSX(f, f3264, f32)	.llong	.f,.f3264
+
+#ifdef CONFIG_PPC_CELL
+#define SPU(x) ,x
+#else
+#define SPU(x)
+#endif
+
+#define SYSCALL(func)		.llong	.sys_##func,.sys_##func SPU(.sys_ni_syscall)
+#define SYSCALL_SPU(func)	.llong	.sys_##func,.sys_##func SPU(.sys_##func)
+#define COMPAT_SYS(func)	.llong	.sys_##func,.compat_sys_##func SPU(.sys_ni_syscall)
+#define COMPAT_SYS_SPU(func)	.llong	.sys_##func,.compat_sys_##func SPU(.sys_##func)
+#define PPC_SYS(func)		.llong	.ppc_##func,.ppc_##func SPU(.sys_ni_syscall)
+#define PPC_SYS_SPU(func)	.llong	.ppc_##func,.ppc_##func SPU(.ppc_##func)
+#define OLDSYS(func)		.llong	.sys_ni_syscall,.sys_ni_syscall SPU(.sys_ni_syscall)
+#define SYS32ONLY(func)		.llong	.sys_ni_syscall,.compat_sys_##func SPU(.sys_ni_syscall)
+#define SYSX(f, f3264, f32)	.llong	.f,.f3264 SPU(.f)
 #else
-#define SYSCALL(func)		.long	sys_##func
 #define COMPAT_SYS(func)	.long	sys_##func
+#define COMPAT_SYS_SPU(func)	.long	sys_##func
 #define PPC_SYS(func)		.long	ppc_##func
+#define PPC_SYS_SPU(func)	.long	ppc_##func
 #define OLDSYS(func)		.long	sys_##func
+#define OLDSYS_SPU(func)	.long	sys_##func
 #define SYS32ONLY(func)		.long	sys_##func
 #define SYSX(f, f3264, f32)	.long	f32
 #endif
@@ -42,175 +54,175 @@ _GLOBAL(sys_call_table)
 SYSCALL(restart_syscall)
 SYSCALL(exit)
 PPC_SYS(fork)
-SYSCALL(read)
-SYSCALL(write)
-COMPAT_SYS(open)
-SYSCALL(close)
-COMPAT_SYS(waitpid)
-COMPAT_SYS(creat)
-SYSCALL(link)
-SYSCALL(unlink)
+SYSCALL_SPU(read)
+SYSCALL_SPU(write)
+COMPAT_SYS_SPU(open)
+SYSCALL_SPU(close)
+COMPAT_SYS_SPU(waitpid)
+COMPAT_SYS_SPU(creat)
+SYSCALL_SPU(link)
+SYSCALL_SPU(unlink)
 COMPAT_SYS(execve)
-SYSCALL(chdir)
-COMPAT_SYS(time)
-SYSCALL(mknod)
-SYSCALL(chmod)
-SYSCALL(lchown)
+SYSCALL_SPU(chdir)
+COMPAT_SYS_SPU(time)
+SYSCALL_SPU(mknod)
+SYSCALL_SPU(chmod)
+SYSCALL_SPU(lchown)
 SYSCALL(ni_syscall)
 OLDSYS(stat)
-SYSX(sys_lseek,ppc32_lseek,sys_lseek)
-SYSCALL(getpid)
+SYSX_SPU(sys_lseek,ppc32_lseek,sys_lseek)
+SYSCALL_SPU(getpid)
 COMPAT_SYS(mount)
 SYSX(sys_ni_syscall,sys_oldumount,sys_oldumount)
-SYSCALL(setuid)
-SYSCALL(getuid)
-COMPAT_SYS(stime)
+SYSCALL_SPU(setuid)
+SYSCALL_SPU(getuid)
+COMPAT_SYS_SPU(stime)
 COMPAT_SYS(ptrace)
-SYSCALL(alarm)
+SYSCALL_SPU(alarm)
 OLDSYS(fstat)
 COMPAT_SYS(pause)
 COMPAT_SYS(utime)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
-COMPAT_SYS(access)
-COMPAT_SYS(nice)
+COMPAT_SYS_SPU(access)
+COMPAT_SYS_SPU(nice)
 SYSCALL(ni_syscall)
-SYSCALL(sync)
-COMPAT_SYS(kill)
-SYSCALL(rename)
-COMPAT_SYS(mkdir)
-SYSCALL(rmdir)
-SYSCALL(dup)
-SYSCALL(pipe)
-COMPAT_SYS(times)
+SYSCALL_SPU(sync)
+COMPAT_SYS_SPU(kill)
+SYSCALL_SPU(rename)
+COMPAT_SYS_SPU(mkdir)
+SYSCALL_SPU(rmdir)
+SYSCALL_SPU(dup)
+SYSCALL_SPU(pipe)
+COMPAT_SYS_SPU(times)
 SYSCALL(ni_syscall)
-SYSCALL(brk)
-SYSCALL(setgid)
-SYSCALL(getgid)
+SYSCALL_SPU(brk)
+SYSCALL_SPU(setgid)
+SYSCALL_SPU(getgid)
 SYSCALL(signal)
-SYSCALL(geteuid)
-SYSCALL(getegid)
+SYSCALL_SPU(geteuid)
+SYSCALL_SPU(getegid)
 SYSCALL(acct)
 SYSCALL(umount)
 SYSCALL(ni_syscall)
-COMPAT_SYS(ioctl)
-COMPAT_SYS(fcntl)
+COMPAT_SYS_SPU(ioctl)
+COMPAT_SYS_SPU(fcntl)
 SYSCALL(ni_syscall)
-COMPAT_SYS(setpgid)
+COMPAT_SYS_SPU(setpgid)
 SYSCALL(ni_syscall)
 SYSX(sys_ni_syscall,sys_olduname, sys_olduname)
-COMPAT_SYS(umask)
-SYSCALL(chroot)
+COMPAT_SYS_SPU(umask)
+SYSCALL_SPU(chroot)
 SYSCALL(ustat)
-SYSCALL(dup2)
-SYSCALL(getppid)
-SYSCALL(getpgrp)
-SYSCALL(setsid)
+SYSCALL_SPU(dup2)
+SYSCALL_SPU(getppid)
+SYSCALL_SPU(getpgrp)
+SYSCALL_SPU(setsid)
 SYS32ONLY(sigaction)
-SYSCALL(sgetmask)
-COMPAT_SYS(ssetmask)
-SYSCALL(setreuid)
-SYSCALL(setregid)
+SYSCALL_SPU(sgetmask)
+COMPAT_SYS_SPU(ssetmask)
+SYSCALL_SPU(setreuid)
+SYSCALL_SPU(setregid)
 SYS32ONLY(sigsuspend)
 COMPAT_SYS(sigpending)
-COMPAT_SYS(sethostname)
-COMPAT_SYS(setrlimit)
+COMPAT_SYS_SPU(sethostname)
+COMPAT_SYS_SPU(setrlimit)
 COMPAT_SYS(old_getrlimit)
-COMPAT_SYS(getrusage)
-COMPAT_SYS(gettimeofday)
-COMPAT_SYS(settimeofday)
-COMPAT_SYS(getgroups)
-COMPAT_SYS(setgroups)
+COMPAT_SYS_SPU(getrusage)
+COMPAT_SYS_SPU(gettimeofday)
+COMPAT_SYS_SPU(settimeofday)
+COMPAT_SYS_SPU(getgroups)
+COMPAT_SYS_SPU(setgroups)
 SYSX(sys_ni_syscall,sys_ni_syscall,ppc_select)
-SYSCALL(symlink)
+SYSCALL_SPU(symlink)
 OLDSYS(lstat)
-COMPAT_SYS(readlink)
+COMPAT_SYS_SPU(readlink)
 SYSCALL(uselib)
 SYSCALL(swapon)
 SYSCALL(reboot)
 SYSX(sys_ni_syscall,old32_readdir,old_readdir)
-SYSCALL(mmap)
-SYSCALL(munmap)
-SYSCALL(truncate)
-SYSCALL(ftruncate)
-SYSCALL(fchmod)
-SYSCALL(fchown)
-COMPAT_SYS(getpriority)
-COMPAT_SYS(setpriority)
+SYSCALL_SPU(mmap)
+SYSCALL_SPU(munmap)
+SYSCALL_SPU(truncate)
+SYSCALL_SPU(ftruncate)
+SYSCALL_SPU(fchmod)
+SYSCALL_SPU(fchown)
+COMPAT_SYS_SPU(getpriority)
+COMPAT_SYS_SPU(setpriority)
 SYSCALL(ni_syscall)
 COMPAT_SYS(statfs)
 COMPAT_SYS(fstatfs)
 SYSCALL(ni_syscall)
-COMPAT_SYS(socketcall)
-COMPAT_SYS(syslog)
-COMPAT_SYS(setitimer)
-COMPAT_SYS(getitimer)
-COMPAT_SYS(newstat)
-COMPAT_SYS(newlstat)
-COMPAT_SYS(newfstat)
+COMPAT_SYS_SPU(socketcall)
+COMPAT_SYS_SPU(syslog)
+COMPAT_SYS_SPU(setitimer)
+COMPAT_SYS_SPU(getitimer)
+COMPAT_SYS_SPU(newstat)
+COMPAT_SYS_SPU(newlstat)
+COMPAT_SYS_SPU(newfstat)
 SYSX(sys_ni_syscall,sys_uname,sys_uname)
 SYSCALL(ni_syscall)
-SYSCALL(vhangup)
+SYSCALL_SPU(vhangup)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
-COMPAT_SYS(wait4)
+COMPAT_SYS_SPU(wait4)
 SYSCALL(swapoff)
-COMPAT_SYS(sysinfo)
+COMPAT_SYS_SPU(sysinfo)
 COMPAT_SYS(ipc)
-SYSCALL(fsync)
+SYSCALL_SPU(fsync)
 SYS32ONLY(sigreturn)
 PPC_SYS(clone)
-COMPAT_SYS(setdomainname)
-PPC_SYS(newuname)
+COMPAT_SYS_SPU(setdomainname)
+PPC_SYS_SPU(newuname)
 SYSCALL(ni_syscall)
-COMPAT_SYS(adjtimex)
-SYSCALL(mprotect)
+COMPAT_SYS_SPU(adjtimex)
+SYSCALL_SPU(mprotect)
 SYSX(sys_ni_syscall,compat_sys_sigprocmask,sys_sigprocmask)
 SYSCALL(ni_syscall)
 SYSCALL(init_module)
 SYSCALL(delete_module)
 SYSCALL(ni_syscall)
 SYSCALL(quotactl)
-COMPAT_SYS(getpgid)
-SYSCALL(fchdir)
-SYSCALL(bdflush)
+COMPAT_SYS_SPU(getpgid)
+SYSCALL_SPU(fchdir)
+SYSCALL_SPU(bdflush)
 COMPAT_SYS(sysfs)
-SYSX(ppc64_personality,ppc64_personality,sys_personality)
+SYSX_SPU(ppc64_personality,ppc64_personality,sys_personality)
 SYSCALL(ni_syscall)
-SYSCALL(setfsuid)
-SYSCALL(setfsgid)
-SYSCALL(llseek)
-COMPAT_SYS(getdents)
-SYSX(sys_select,ppc32_select,ppc_select)
-SYSCALL(flock)
-SYSCALL(msync)
-COMPAT_SYS(readv)
-COMPAT_SYS(writev)
-COMPAT_SYS(getsid)
-SYSCALL(fdatasync)
+SYSCALL_SPU(setfsuid)
+SYSCALL_SPU(setfsgid)
+SYSCALL_SPU(llseek)
+COMPAT_SYS_SPU(getdents)
+SYSX_SPU(sys_select,ppc32_select,ppc_select)
+SYSCALL_SPU(flock)
+SYSCALL_SPU(msync)
+COMPAT_SYS_SPU(readv)
+COMPAT_SYS_SPU(writev)
+COMPAT_SYS_SPU(getsid)
+SYSCALL_SPU(fdatasync)
 COMPAT_SYS(sysctl)
-SYSCALL(mlock)
-SYSCALL(munlock)
-SYSCALL(mlockall)
-SYSCALL(munlockall)
-COMPAT_SYS(sched_setparam)
-COMPAT_SYS(sched_getparam)
-COMPAT_SYS(sched_setscheduler)
-COMPAT_SYS(sched_getscheduler)
-SYSCALL(sched_yield)
-COMPAT_SYS(sched_get_priority_max)
-COMPAT_SYS(sched_get_priority_min)
-COMPAT_SYS(sched_rr_get_interval)
-COMPAT_SYS(nanosleep)
-SYSCALL(mremap)
-SYSCALL(setresuid)
-SYSCALL(getresuid)
+SYSCALL_SPU(mlock)
+SYSCALL_SPU(munlock)
+SYSCALL_SPU(mlockall)
+SYSCALL_SPU(munlockall)
+COMPAT_SYS_SPU(sched_setparam)
+COMPAT_SYS_SPU(sched_getparam)
+COMPAT_SYS_SPU(sched_setscheduler)
+COMPAT_SYS_SPU(sched_getscheduler)
+SYSCALL_SPU(sched_yield)
+COMPAT_SYS_SPU(sched_get_priority_max)
+COMPAT_SYS_SPU(sched_get_priority_min)
+COMPAT_SYS_SPU(sched_rr_get_interval)
+COMPAT_SYS_SPU(nanosleep)
+SYSCALL_SPU(mremap)
+SYSCALL_SPU(setresuid)
+SYSCALL_SPU(getresuid)
 SYSCALL(ni_syscall)
-SYSCALL(poll)
+SYSCALL_SPU(poll)
 COMPAT_SYS(nfsservctl)
-SYSCALL(setresgid)
-SYSCALL(getresgid)
-COMPAT_SYS(prctl)
+SYSCALL_SPU(setresgid)
+SYSCALL_SPU(getresgid)
+COMPAT_SYS_SPU(prctl)
 COMPAT_SYS(rt_sigreturn)
 COMPAT_SYS(rt_sigaction)
 COMPAT_SYS(rt_sigprocmask)
@@ -218,19 +230,19 @@ COMPAT_SYS(rt_sigpending)
 COMPAT_SYS(rt_sigtimedwait)
 COMPAT_SYS(rt_sigqueueinfo)
 COMPAT_SYS(rt_sigsuspend)
-COMPAT_SYS(pread64)
-COMPAT_SYS(pwrite64)
-SYSCALL(chown)
-SYSCALL(getcwd)
-SYSCALL(capget)
-SYSCALL(capset)
+COMPAT_SYS_SPU(pread64)
+COMPAT_SYS_SPU(pwrite64)
+SYSCALL_SPU(chown)
+SYSCALL_SPU(getcwd)
+SYSCALL_SPU(capget)
+SYSCALL_SPU(capset)
 COMPAT_SYS(sigaltstack)
-SYSX(sys_sendfile64,compat_sys_sendfile,sys_sendfile)
+SYSX_SPU(sys_sendfile64,compat_sys_sendfile,sys_sendfile)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
 PPC_SYS(vfork)
-COMPAT_SYS(getrlimit)
-COMPAT_SYS(readahead)
+COMPAT_SYS_SPU(getrlimit)
+COMPAT_SYS_SPU(readahead)
 SYS32ONLY(mmap2)
 SYS32ONLY(truncate64)
 SYS32ONLY(ftruncate64)
@@ -241,60 +253,60 @@ SYSCALL(pciconfig_read)
 SYSCALL(pciconfig_write)
 SYSCALL(pciconfig_iobase)
 SYSCALL(ni_syscall)
-SYSCALL(getdents64)
-SYSCALL(pivot_root)
+SYSCALL_SPU(getdents64)
+SYSCALL_SPU(pivot_root)
 SYSX(sys_ni_syscall,compat_sys_fcntl64,sys_fcntl64)
-SYSCALL(madvise)
-SYSCALL(mincore)
-SYSCALL(gettid)
-SYSCALL(tkill)
-SYSCALL(setxattr)
-SYSCALL(lsetxattr)
-SYSCALL(fsetxattr)
-SYSCALL(getxattr)
-SYSCALL(lgetxattr)
-SYSCALL(fgetxattr)
-SYSCALL(listxattr)
-SYSCALL(llistxattr)
-SYSCALL(flistxattr)
-SYSCALL(removexattr)
-SYSCALL(lremovexattr)
-SYSCALL(fremovexattr)
-COMPAT_SYS(futex)
-COMPAT_SYS(sched_setaffinity)
-COMPAT_SYS(sched_getaffinity)
+SYSCALL_SPU(madvise)
+SYSCALL_SPU(mincore)
+SYSCALL_SPU(gettid)
+SYSCALL_SPU(tkill)
+SYSCALL_SPU(setxattr)
+SYSCALL_SPU(lsetxattr)
+SYSCALL_SPU(fsetxattr)
+SYSCALL_SPU(getxattr)
+SYSCALL_SPU(lgetxattr)
+SYSCALL_SPU(fgetxattr)
+SYSCALL_SPU(listxattr)
+SYSCALL_SPU(llistxattr)
+SYSCALL_SPU(flistxattr)
+SYSCALL_SPU(removexattr)
+SYSCALL_SPU(lremovexattr)
+SYSCALL_SPU(fremovexattr)
+COMPAT_SYS_SPU(futex)
+COMPAT_SYS_SPU(sched_setaffinity)
+COMPAT_SYS_SPU(sched_getaffinity)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
 SYS32ONLY(sendfile64)
-COMPAT_SYS(io_setup)
-SYSCALL(io_destroy)
-COMPAT_SYS(io_getevents)
-COMPAT_SYS(io_submit)
-SYSCALL(io_cancel)
+COMPAT_SYS_SPU(io_setup)
+SYSCALL_SPU(io_destroy)
+COMPAT_SYS_SPU(io_getevents)
+COMPAT_SYS_SPU(io_submit)
+SYSCALL_SPU(io_cancel)
 SYSCALL(set_tid_address)
-SYSX(sys_fadvise64,ppc32_fadvise64,sys_fadvise64)
+SYSX_SPU(sys_fadvise64,ppc32_fadvise64,sys_fadvise64)
 SYSCALL(exit_group)
 SYSX(sys_lookup_dcookie,ppc32_lookup_dcookie,sys_lookup_dcookie)
-SYSCALL(epoll_create)
-SYSCALL(epoll_ctl)
-SYSCALL(epoll_wait)
-SYSCALL(remap_file_pages)
-SYSX(sys_timer_create,compat_sys_timer_create,sys_timer_create)
-COMPAT_SYS(timer_settime)
-COMPAT_SYS(timer_gettime)
-SYSCALL(timer_getoverrun)
-SYSCALL(timer_delete)
-COMPAT_SYS(clock_settime)
-COMPAT_SYS(clock_gettime)
-COMPAT_SYS(clock_getres)
-COMPAT_SYS(clock_nanosleep)
+SYSCALL_SPU(epoll_create)
+SYSCALL_SPU(epoll_ctl)
+SYSCALL_SPU(epoll_wait)
+SYSCALL_SPU(remap_file_pages)
+SYSX_SPU(sys_timer_create,compat_sys_timer_create,sys_timer_create)
+COMPAT_SYS_SPU(timer_settime)
+COMPAT_SYS_SPU(timer_gettime)
+SYSCALL_SPU(timer_getoverrun)
+SYSCALL_SPU(timer_delete)
+COMPAT_SYS_SPU(clock_settime)
+COMPAT_SYS_SPU(clock_gettime)
+COMPAT_SYS_SPU(clock_getres)
+COMPAT_SYS_SPU(clock_nanosleep)
 SYSX(ppc64_swapcontext,ppc32_swapcontext,ppc_swapcontext)
-COMPAT_SYS(tgkill)
-COMPAT_SYS(utimes)
-COMPAT_SYS(statfs64)
-COMPAT_SYS(fstatfs64)
+COMPAT_SYS_SPU(tgkill)
+COMPAT_SYS_SPU(utimes)
+COMPAT_SYS_SPU(statfs64)
+COMPAT_SYS_SPU(fstatfs64)
 SYSX(sys_ni_syscall, ppc_fadvise64_64, ppc_fadvise64_64)
-PPC_SYS(rtas)
+PPC_SYS_SPU(rtas)
 OLDSYS(debug_setcontext)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
@@ -321,11 +333,6 @@ SYSCALL(spu_run)
 SYSCALL(spu_create)
 COMPAT_SYS(pselect6)
 COMPAT_SYS(ppoll)
-SYSCALL(unshare)
-SYSCALL(splice)
-SYSCALL(tee)
-
-/*
- * please add new calls to arch/powerpc/platforms/cell/spu_callbacks.c
- * as well when appropriate.
- */
+SYSCALL_SPU(unshare)
+SYSCALL_SPU(splice)
+SYSCALL_SPU(tee)

-- 
dwmw2

^ permalink raw reply related

* Re: [PATCH] Wire up *at syscalls
From: Andreas Schwab @ 2006-04-25  0:03 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linuxppc-dev
In-Reply-To: <200604250131.22154.arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> writes:

> Simply removing the #ifdef looks wrong. AFAICT, powerpc and ia64 are the
> only ones that want both sys_newfstatat and sys_fstatat64, maybe you
> can simply do 
>
> #if !defined(__ARCH_WANT_STAT64) || defined(__ARCH_WANT_NEWSTATFSAT)

Updated patch below.

Andreas.

Signed-off-by: Andreas Schwab <schwab@suse.de>

---
 arch/powerpc/kernel/systbl.S                |   13 +++++++++++++
 arch/powerpc/platforms/cell/spu_callbacks.c |   13 +++++++++++++
 fs/stat.c                                   |    2 +-
 include/asm-powerpc/unistd.h                |   20 +++++++++++++++++++-
 4 files changed, 46 insertions(+), 2 deletions(-)

Index: linux-2.6.17-rc2-git5/arch/powerpc/kernel/systbl.S
===================================================================
--- linux-2.6.17-rc2-git5.orig/arch/powerpc/kernel/systbl.S	2006-04-24 16:24:47.000000000 +0200
+++ linux-2.6.17-rc2-git5/arch/powerpc/kernel/systbl.S	2006-04-24 19:53:50.000000000 +0200
@@ -324,6 +324,19 @@ COMPAT_SYS(ppoll)
 SYSCALL(unshare)
 SYSCALL(splice)
 SYSCALL(tee)
+COMPAT_SYS(openat)
+SYSCALL(mkdirat)
+SYSCALL(mknodat)
+SYSCALL(fchownat)
+COMPAT_SYS(futimesat)
+SYSX(sys_newfstatat, sys_fstatat64, sys_fstatat64)
+SYSCALL(unlinkat)
+SYSCALL(renameat)
+SYSCALL(linkat)
+SYSCALL(symlinkat)
+SYSCALL(readlinkat)
+SYSCALL(fchmodat)
+SYSCALL(faccessat)
 
 /*
  * please add new calls to arch/powerpc/platforms/cell/spu_callbacks.c
Index: linux-2.6.17-rc2-git5/include/asm-powerpc/unistd.h
===================================================================
--- linux-2.6.17-rc2-git5.orig/include/asm-powerpc/unistd.h	2006-04-24 16:24:15.000000000 +0200
+++ linux-2.6.17-rc2-git5/include/asm-powerpc/unistd.h	2006-04-25 01:35:38.000000000 +0200
@@ -303,8 +303,25 @@
 #define __NR_unshare		282
 #define __NR_splice		283
 #define __NR_tee		284
+#define __NR_openat		285
+#define __NR_mkdirat		286
+#define __NR_mknodat		287
+#define __NR_fchownat		288
+#define __NR_futimesat		289
+#ifdef __powerpc64__
+#define __NR_newfstatat		290
+#else
+#define __NR_fstatat64		290
+#endif
+#define __NR_unlinkat		291
+#define __NR_renameat		292
+#define __NR_linkat		293
+#define __NR_symlinkat		294
+#define __NR_readlinkat		295
+#define __NR_fchmodat		296
+#define __NR_faccessat		297
 
-#define __NR_syscalls		285
+#define __NR_syscalls		298
 
 #ifdef __KERNEL__
 #define __NR__exit __NR_exit
@@ -457,6 +474,7 @@ type name(type1 arg1, type2 arg2, type3 
 #ifdef CONFIG_PPC64
 #define __ARCH_WANT_COMPAT_SYS_TIME
 #define __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND
+#define __ARCH_WANT_SYS_NEWFSTATAT
 #endif
 
 /*
Index: linux-2.6.17-rc2-git5/arch/powerpc/platforms/cell/spu_callbacks.c
===================================================================
--- linux-2.6.17-rc2-git5.orig/arch/powerpc/platforms/cell/spu_callbacks.c	2006-04-24 16:24:47.000000000 +0200
+++ linux-2.6.17-rc2-git5/arch/powerpc/platforms/cell/spu_callbacks.c	2006-04-24 23:51:16.000000000 +0200
@@ -318,6 +318,19 @@ void *spu_syscall_table[] = {
 	[__NR_unshare]			sys_unshare,
 	[__NR_splice]			sys_splice,
 	[__NR_tee]			sys_tee,
+	[__NR_openat]			sys_openat,
+	[__NR_mkdirat]			sys_mkdirat,
+	[__NR_mknodat]			sys_mknodat,
+	[__NR_fchownat]			sys_fchownat,
+	[__NR_futimesat]		sys_futimesat,
+	[__NR_newfstatat]		sys_newfstatat,
+	[__NR_unlinkat]			sys_unlinkat,
+	[__NR_renameat]			sys_renameat,
+	[__NR_linkat]			sys_linkat,
+	[__NR_symlinkat]		sys_symlinkat,
+	[__NR_readlinkat]		sys_readlinkat,
+	[__NR_fchmodat]			sys_fchmodat,
+	[__NR_faccessat]		sys_faccessat,
 };
 
 long spu_sys_callback(struct spu_syscall_block *s)
Index: linux-2.6.17-rc2-git5/fs/stat.c
===================================================================
--- linux-2.6.17-rc2-git5.orig/fs/stat.c	2006-04-24 18:05:23.000000000 +0200
+++ linux-2.6.17-rc2-git5/fs/stat.c	2006-04-25 01:37:06.000000000 +0200
@@ -261,7 +261,7 @@ asmlinkage long sys_newlstat(char __user
 	return error;
 }
 
-#ifndef __ARCH_WANT_STAT64
+#if !defined(__ARCH_WANT_STAT64) || defined(__ARCH_WANT_SYS_NEWFSTATAT)
 asmlinkage long sys_newfstatat(int dfd, char __user *filename,
 				struct stat __user *statbuf, int flag)
 {

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] Wire up *at syscalls
From: Arnd Bergmann @ 2006-04-24 23:31 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <jeu08ippb9.fsf@sykes.suse.de>

QW0gVHVlc2RheSAyNSBBcHJpbCAyMDA2IDAwOjQzIHNjaHJpZWIgQW5kcmVhcyBTY2h3YWI6Cj4g
SW5kZXg6IGxpbnV4LTIuNi4xNy1yYzItZ2l0NS9hcmNoL3Bvd2VycGMvcGxhdGZvcm1zL2NlbGwv
c3B1X2NhbGxiYWNrcy5jCj4gPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PQo+IC0tLQo+IGxpbnV4LTIuNi4xNy1yYzItZ2l0
NS5vcmlnL2FyY2gvcG93ZXJwYy9wbGF0Zm9ybXMvY2VsbC9zcHVfY2FsbGJhY2tzLmOgoKCgoAo+
oDIwMDYtMDQtMjQgMTY6MjQ6NDcuMDAwMDAwMDAwICswMjAwICsrKwo+IGxpbnV4LTIuNi4xNy1y
YzItZ2l0NS9hcmNoL3Bvd2VycGMvcGxhdGZvcm1zL2NlbGwvc3B1X2NhbGxiYWNrcy5joKCgMjAw
Ni0wNAo+LTI0IDE4OjA3OjU5LjAwMDAwMDAwMCArMDIwMCBAQCAtMzE4LDYgKzMxOCwxOSBAQCB2
b2lkICpzcHVfc3lzY2FsbF90YWJsZVtdCj4gPSB7Cj4goKCgoKCgoKBbX19OUl91bnNoYXJlXaCg
oKCgoKCgoKCgoKCgoKCgoHN5c191bnNoYXJlLAo+IKCgoKCgoKCgW19fTlJfc3BsaWNlXaCgoKCg
oKCgoKCgoKCgoKCgoKBzeXNfc3BsaWNlLAo+IKCgoKCgoKCgW19fTlJfdGVlXaCgoKCgoKCgoKCg
oKCgoKCgoKCgoKBzeXNfdGVlLAo+ICugoKCgoKCgW19fTlJfb3BlbmF0XaCgoKCgoKCgoKCgoKCg
oKCgoKBzeXNfb3BlbmF0LAo+ICugoKCgoKCgW19fTlJfbWtkaXJhdF2goKCgoKCgoKCgoKCgoKCg
oKBzeXNfbWtkaXJhdCwKPiAroKCgoKCgoFtfX05SX21rbm9kYXRdoKCgoKCgoKCgoKCgoKCgoKCg
c3lzX21rbm9kYXQsCj4gK6CgoKCgoKBbX19OUl9mY2hvd25hdF2goKCgoKCgoKCgoKCgoKCgoHN5
c19mY2hvd25hdCwKPiAroKCgoKCgoFtfX05SX2Z1dGltZXNhdF2goKCgoKCgoKCgoKCgoKCgc3lz
X2Z1dGltZXNhdCwKPiAroKCgoKCgoFtfX05SX25ld2ZzdGF0YXRdoKCgoKCgoKCgoKCgoKCgc3lz
X25ld2ZzdGF0YXQsCj4gK6CgoKCgoKBbX19OUl91bmxpbmthdF2goKCgoKCgoKCgoKCgoKCgoHN5
c191bmxpbmthdCwKPiAroKCgoKCgoFtfX05SX3JlbmFtZWF0XaCgoKCgoKCgoKCgoKCgoKCgc3lz
X3JlbmFtZWF0LAo+ICugoKCgoKCgW19fTlJfbGlua2F0XaCgoKCgoKCgoKCgoKCgoKCgoKBzeXNf
bGlua2F0LAo+ICugoKCgoKCgW19fTlJfc3ltbGlua2F0XaCgoKCgoKCgoKCgoKCgoKBzeXNfc3lt
bGlua2F0LAo+ICugoKCgoKCgW19fTlJfcmVhZGxpbmthdF2goKCgoKCgoKCgoKCgoKBzeXNfcmVh
ZGxpbmthdCwKPiAroKCgoKCgoFtfX05SX2ZjaG1vZGF0XaCgoKCgoKCgoKCgoKCgoKCgc3lzX2Zj
aG1vZGF0LAo+ICugoKCgoKCgW19fTlJfZmFjY2Vzc2F0XaCgoKCgoKCgoKCgoKCgoKBzeXNfZmFj
Y2Vzc2F0LAo+IKB9Owo+IKAKClRoZSBzcHUgcGFydHMgbG9vayBmaW5lLCB0aGFua3MuCgo+IKBs
b25nIHNwdV9zeXNfY2FsbGJhY2soc3RydWN0IHNwdV9zeXNjYWxsX2Jsb2NrICpzKQo+IEluZGV4
OiBsaW51eC0yLjYuMTctcmMyLWdpdDUvZnMvc3RhdC5jCj4gPT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQo+IC0tLSBsaW51
eC0yLjYuMTctcmMyLWdpdDUub3JpZy9mcy9zdGF0LmOgoKCgoKCgoDIwMDYtMDQtMjQKPiAxODow
NToyMy4wMDAwMDAwMDAgKzAyMDAgKysrIGxpbnV4LTIuNi4xNy1yYzItZ2l0NS9mcy9zdGF0LmOg
oKCgoDIwMDYtMDQtMjQKPiAxODowNTo0NC4wMDAwMDAwMDAgKzAyMDAgQEAgLTI2MSw3ICsyNjEs
NiBAQCBhc21saW5rYWdlIGxvbmcKPiBzeXNfbmV3bHN0YXQoY2hhciBfX3VzZXIKPiCgoKCgoKCg
oHJldHVybiBlcnJvcjsKPiCgfQo+IKAKPiAtI2lmbmRlZiBfX0FSQ0hfV0FOVF9TVEFUNjQKPiCg
YXNtbGlua2FnZSBsb25nIHN5c19uZXdmc3RhdGF0KGludCBkZmQsIGNoYXIgX191c2VyICpmaWxl
bmFtZSwKPiCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoHN0cnVjdCBzdGF0IF9fdXNl
ciAqc3RhdGJ1ZiwgaW50IGZsYWcpCj4goHsKPiBAQCAtMjgyLDcgKzI4MSw2IEBAIGFzbWxpbmth
Z2UgbG9uZyBzeXNfbmV3ZnN0YXRhdChpbnQgZGZkLAo+IKBvdXQ6Cj4goKCgoKCgoKByZXR1cm4g
ZXJyb3I7Cj4goH0KPiAtI2VuZGlmCj4goAo+IKBhc21saW5rYWdlIGxvbmcgc3lzX25ld2ZzdGF0
KHVuc2lnbmVkIGludCBmZCwgc3RydWN0IHN0YXQgX191c2VyICpzdGF0YnVmKQo+IKB7CgpTaW1w
bHkgcmVtb3ZpbmcgdGhlICNpZmRlZiBsb29rcyB3cm9uZy4gQUZBSUNULCBwb3dlcnBjIGFuZCBp
YTY0IGFyZSB0aGUKb25seSBvbmVzIHRoYXQgd2FudCBib3RoIHN5c19uZXdmc3RhdGF0IGFuZCBz
eXNfZnN0YXRhdDY0LCBtYXliZSB5b3UKY2FuIHNpbXBseSBkbyAKCiNpZiAhZGVmaW5lZChfX0FS
Q0hfV0FOVF9TVEFUNjQpIHx8IGRlZmluZWQoX19BUkNIX1dBTlRfTkVXU1RBVEZTQVQpCgoJQXJu
ZCA8PjwK

^ permalink raw reply

* Re: alignment exceptionhandler sleeps in invalid context
From: Paul Mackerras @ 2006-04-24 23:12 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev
In-Reply-To: <20060424173246.GA22318@suse.de>

Olaf Hering writes:

> I'm not sure where the bug is. Does it mean the network stack does
> something nasty, or is the exception handler itself broken? (probably the latter)
> This is 2.6.16.9 on a p270.

The alignment handler does a get_user to read the instruction.  I
suppose it will have to read the instruction directly if the exception
occurred in kernel mode.

Paul.

^ permalink raw reply

* Re: Xilinx Virtex-2 PRO FPGA ppc 405 on ML310 board
From: Aidan Williams @ 2006-04-24 23:02 UTC (permalink / raw)
  To: Vincent Winstead; +Cc: linuxppc-embedded list
In-Reply-To: <20060424203047.86885.qmail@web52005.mail.yahoo.com>


After generating a new auto-config.in using the BSP,
you'll need to:

   0. put new auto-config.in into kernel directory
      linux-2.4.x/arch/ppc/platforms/xilinx_auto

   1. change to the top uClinux-dist directory
      (*NOT* the kernel directory)

   2. In "Vendor/Product Selection  --->"
      Choose "Xilinx"
      Choose "powerpc-auto"

   3. In "Kernel/Library/Defaults Selection  --->"
      Choose linux-2.4.x as your kernel, making sure that
      this is a symlink to or copy of the linuxppc-2.4
      tarball from UQ.

   4. In "Kernel/Library/Defaults Selection  --->"
      Choose "[*] Default all settings (lose changes)"

   5. Save and quit..

This process will pull in default Xilinx/powerpc config
files from vendors/Xilinx/powerpc-auto/config.*
for the kernel and for other uClinux components.

It will also generate a new linux-2.4.x/.config
based on your auto-config.in and the defaults
in vendors/Xilinx/powerpc-auto/config.linux-2.4.x.

btw, I don't have an ML310 board.  I have however gotten
the UQ kernel + uclinux to work on a v2pro FF1152 board
and a virtex-4 FX-12 minimodule.

another btw, the linuxppc-2.4 code has an auto-config.in
inside it which, if you follow the above steps (without
doing step 0), should compile.  The supplied auto-config.in
won't match your board, but it should at least compile.

- aidan


Vincent Winstead wrote:
> guess what - I got the BSP to work!  But now I have a problem with the 
> make command.  I put the auto-config.in where it needs to be then I did 
> a make menuconfig to configure it, then I did a make dep, but when I did 
> a make is when  an error came up that I can't seem to figure out:
> 
> In file included from 
> /home/ml310_linux/uClinux-dist/linuxppc-2.4/include/linux/pagemap.h:16,
>                  from 
> /home/ml310_linux/uClinux-dist/linuxppc-2.4/include/linux/locks.h:8,
>                  from 
> /home/ml310_linux/uClinux-dist/linuxppc-2.4/include/linux/blk.h:5,
>                  from init/main.c:25:
> /home/ml310_linux/uClinux-dist/linuxppc-2.4/include/linux/highmem.h: In 
> function `kmap':
> /home/ml310_linux/uClinux-dist/linuxppc-2.4/include/linux/highmem.h:68: 
> error: `CONFIG_KERNEL_START' undeclared (first use in this function)
> init/main.c: In function `start_kernel':
> init/main.c:393: error: `CONFIG_KERNEL_START' undeclared (first use in 
> this function)
> make[1]: *** [init/main.o] Error 1
> make[1]: Leaving directory `/home/ml310_linux/uClinux-dist/linuxppc-2.4'
> make: *** [linux] Error 1
> 
> This CONFIG_KERNEL_START is the problem.  It doesn't seem to be defined 
> anywhere and I guess it needs to be.  Is this something I need to get 
> from somewhere?  Or is it maybe generated along with the BSP so I would 
> have to put a start number into the platform Studio configuration?
> 
> -Vincent
> 
> 
> */Aidan Williams /* wrote:
> 
> 
>     Vincent Winstead wrote:
>      > Now, as far as step 5, am I supposed to have a symbolic link that is
>      > named linux-2.4.x placed into the uClinux-dist directory? Because
>      > there's already a folder named linux-2.4.x which was in there
>     already
>      > when I untarred everything. At the command prompt in the
>     uClinux-dist
>      > directory I entered the following line:
>      >
>      > ln -s ../linuxppc-2.4 linux-2.4.x
>      >
>      > and the result of this operation was to put a symbolic link into my
>      > linuxppc-2.4 directory with the name of linux-2.4.x - is this
>     correct?
>      >
> 
>     First, you'll need to move the existing directory aside using
>     a command like:
> 
>     mv linux-2.4.x linux-2.4.x-dist
> 
>     and then re-run the ln -s command above.
> 
>      > Now on to Step 6 problem.
>      > How am I supposed to make use uClinux EDK Board Support Package 1.0
>      > files? I'm not sure how to go about using them in the Xilinx
>     Platform
>      > Studio in order to generate the necessary auto-config.in file.
>      >
> 
>     See the document below for the general approach:
> 
>      > Even though it is about the microblaze rather than
>      > the PPC, a helpful "getting started" document is:
>      >
>     http://www.itee.uq.edu.au/~wu/downloads/uClinux_ready_Microblaze_design.pdf
>      >
> 
>     Look particularly at the section "Software Platform Settings"
>     on page 29, steps 67,68.
> 
>     If you are not overly familiar with the EDK, it would
>     be best to find someone locally who can help walk you
>     through the process of generating a system.
> 
>     - aidan
> 
> 
>     --------------------------------------------------------------------------
>     This email and any attachments may be confidential. They may contain
>     legally
>     privileged information or copyright material. You should not read, copy,
>     use or disclose them without authorisation. If you are not an intended
>     recipient, please contact us at once by return email and then delete
>     both
>     messages. We do not accept liability in connection with computer virus,
>     data corruption, delay, interruption, unauthorised access or
>     unauthorised
>     amendment. This notice should not be removed.
> 
> 

^ permalink raw reply

* [PATCH] Wire up *at syscalls
From: Andreas Schwab @ 2006-04-24 22:43 UTC (permalink / raw)
  To: linuxppc-dev

This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).

Signed-off-by: Andreas Schwab <schwab@suse.de>

---
 arch/powerpc/kernel/systbl.S                |   13 +++++++++++++
 arch/powerpc/platforms/cell/spu_callbacks.c |   13 +++++++++++++
 fs/stat.c                                   |    2 --
 include/asm-powerpc/unistd.h                |   19 ++++++++++++++++++-
 4 files changed, 44 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc2-git5/arch/powerpc/kernel/systbl.S
===================================================================
--- linux-2.6.17-rc2-git5.orig/arch/powerpc/kernel/systbl.S	2006-04-24 16:24:47.000000000 +0200
+++ linux-2.6.17-rc2-git5/arch/powerpc/kernel/systbl.S	2006-04-24 19:53:50.000000000 +0200
@@ -324,6 +324,19 @@ COMPAT_SYS(ppoll)
 SYSCALL(unshare)
 SYSCALL(splice)
 SYSCALL(tee)
+COMPAT_SYS(openat)
+SYSCALL(mkdirat)
+SYSCALL(mknodat)
+SYSCALL(fchownat)
+COMPAT_SYS(futimesat)
+SYSX(sys_newfstatat, sys_fstatat64, sys_fstatat64)
+SYSCALL(unlinkat)
+SYSCALL(renameat)
+SYSCALL(linkat)
+SYSCALL(symlinkat)
+SYSCALL(readlinkat)
+SYSCALL(fchmodat)
+SYSCALL(faccessat)
 
 /*
  * please add new calls to arch/powerpc/platforms/cell/spu_callbacks.c
Index: linux-2.6.17-rc2-git5/include/asm-powerpc/unistd.h
===================================================================
--- linux-2.6.17-rc2-git5.orig/include/asm-powerpc/unistd.h	2006-04-24 16:24:15.000000000 +0200
+++ linux-2.6.17-rc2-git5/include/asm-powerpc/unistd.h	2006-04-24 18:06:53.000000000 +0200
@@ -303,8 +303,25 @@
 #define __NR_unshare		282
 #define __NR_splice		283
 #define __NR_tee		284
+#define __NR_openat		285
+#define __NR_mkdirat		286
+#define __NR_mknodat		287
+#define __NR_fchownat		288
+#define __NR_futimesat		289
+#ifdef __powerpc64__
+#define __NR_newfstatat		290
+#else
+#define __NR_fstatat64		290
+#endif
+#define __NR_unlinkat		291
+#define __NR_renameat		292
+#define __NR_linkat		293
+#define __NR_symlinkat		294
+#define __NR_readlinkat		295
+#define __NR_fchmodat		296
+#define __NR_faccessat		297
 
-#define __NR_syscalls		285
+#define __NR_syscalls		298
 
 #ifdef __KERNEL__
 #define __NR__exit __NR_exit
Index: linux-2.6.17-rc2-git5/arch/powerpc/platforms/cell/spu_callbacks.c
===================================================================
--- linux-2.6.17-rc2-git5.orig/arch/powerpc/platforms/cell/spu_callbacks.c	2006-04-24 16:24:47.000000000 +0200
+++ linux-2.6.17-rc2-git5/arch/powerpc/platforms/cell/spu_callbacks.c	2006-04-24 18:07:59.000000000 +0200
@@ -318,6 +318,19 @@ void *spu_syscall_table[] = {
 	[__NR_unshare]			sys_unshare,
 	[__NR_splice]			sys_splice,
 	[__NR_tee]			sys_tee,
+	[__NR_openat]			sys_openat,
+	[__NR_mkdirat]			sys_mkdirat,
+	[__NR_mknodat]			sys_mknodat,
+	[__NR_fchownat]			sys_fchownat,
+	[__NR_futimesat]		sys_futimesat,
+	[__NR_newfstatat]		sys_newfstatat,
+	[__NR_unlinkat]			sys_unlinkat,
+	[__NR_renameat]			sys_renameat,
+	[__NR_linkat]			sys_linkat,
+	[__NR_symlinkat]		sys_symlinkat,
+	[__NR_readlinkat]		sys_readlinkat,
+	[__NR_fchmodat]			sys_fchmodat,
+	[__NR_faccessat]		sys_faccessat,
 };
 
 long spu_sys_callback(struct spu_syscall_block *s)
Index: linux-2.6.17-rc2-git5/fs/stat.c
===================================================================
--- linux-2.6.17-rc2-git5.orig/fs/stat.c	2006-04-24 18:05:23.000000000 +0200
+++ linux-2.6.17-rc2-git5/fs/stat.c	2006-04-24 18:05:44.000000000 +0200
@@ -261,7 +261,6 @@ asmlinkage long sys_newlstat(char __user
 	return error;
 }
 
-#ifndef __ARCH_WANT_STAT64
 asmlinkage long sys_newfstatat(int dfd, char __user *filename,
 				struct stat __user *statbuf, int flag)
 {
@@ -282,7 +281,6 @@ asmlinkage long sys_newfstatat(int dfd, 
 out:
 	return error;
 }
-#endif
 
 asmlinkage long sys_newfstat(unsigned int fd, struct stat __user *statbuf)
 {

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: 85xx FDT updates?
From: Dan Malek @ 2006-04-24 22:31 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <4C61B597-BD91-4D05-BB40-43DE0319F123@kernel.crashing.org>


On Apr 24, 2006, at 5:43 PM, Kumar Gala wrote:

> However, I am against removing the arch/ppc support until the u-boot
> patches are picked up.  I think its bad form to give people a kernel
> they can't easily boot.

What about systems that can't update u-boot, but want to run
a newer kernel?


	-- Dan

^ permalink raw reply

* [PATCH] Use check_legacy_ioport() on ppc32 too.
From: David Woodhouse @ 2006-04-24 22:22 UTC (permalink / raw)
  To: paulus, benh; +Cc: linuxppc-dev

Some people report that we die on some Macs when we are expecting to
catch machine checks after poking at some random I/O address. I'd seen
it happen on my dual G4 with serial ports until we fixed those to use
OF, but now other users are reporting it with i8042.

This expands the use of check_legacy_ioport() to avoid that situation
even on 32-bit kernels.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1d93e73..684ab1d 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -516,3 +516,11 @@ void probe_machine(void)
 
 	printk(KERN_INFO "Using %s machine description\n", ppc_md.name);
 }
+
+int check_legacy_ioport(unsigned long base_port)
+{
+	if (ppc_md.check_legacy_ioport == NULL)
+		return 0;
+	return ppc_md.check_legacy_ioport(base_port);
+}
+EXPORT_SYMBOL(check_legacy_ioport);
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 13e91c4..4467c49 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -594,14 +594,6 @@ void ppc64_terminate_msg(unsigned int sr
 	printk("[terminate]%04x %s\n", src, msg);
 }
 
-int check_legacy_ioport(unsigned long base_port)
-{
-	if (ppc_md.check_legacy_ioport == NULL)
-		return 0;
-	return ppc_md.check_legacy_ioport(base_port);
-}
-EXPORT_SYMBOL(check_legacy_ioport);
-
 void cpu_die(void)
 {
 	if (ppc_md.cpu_die)
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index bedb689..dff1e67 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4301,7 +4301,7 @@ #endif
 	}
 
 	use_virtual_dma = can_use_virtual_dma & 1;
-#if defined(CONFIG_PPC64)
+#if defined(CONFIG_PPC_MERGE)
 	if (check_legacy_ioport(FDC1)) {
 		del_timer(&fd_timeout);
 		err = -ENODEV;
diff --git a/drivers/input/serio/i8042-io.h b/drivers/input/serio/i8042-io.h
index 9a92216..cc21914 100644
--- a/drivers/input/serio/i8042-io.h
+++ b/drivers/input/serio/i8042-io.h
@@ -67,14 +67,14 @@ static inline int i8042_platform_init(vo
  * On some platforms touching the i8042 data register region can do really
  * bad things. Because of this the region is always reserved on such boxes.
  */
-#if !defined(__sh__) && !defined(__alpha__) && !defined(__mips__) && !defined(CONFIG_PPC64)
+#if !defined(__sh__) && !defined(__alpha__) && !defined(__mips__) && !defined(CONFIG_PPC_MERGE)
 	if (!request_region(I8042_DATA_REG, 16, "i8042"))
 		return -EBUSY;
 #endif
 
         i8042_reset = 1;
 
-#if defined(CONFIG_PPC64)
+#if defined(CONFIG_PPC_MERGE)
 	if (check_legacy_ioport(I8042_DATA_REG))
 		return -EBUSY;
 	if (!request_region(I8042_DATA_REG, 16, "i8042"))
diff --git a/include/asm-powerpc/io.h b/include/asm-powerpc/io.h
index 68efbea..f1c2469 100644
--- a/include/asm-powerpc/io.h
+++ b/include/asm-powerpc/io.h
@@ -9,6 +9,9 @@ #ifdef __KERNEL__
  * 2 of the License, or (at your option) any later version.
  */
 
+/* Check of existence of legacy devices */
+extern int check_legacy_ioport(unsigned long base_port);
+
 #ifndef CONFIG_PPC64
 #include <asm-ppc/io.h>
 #else
@@ -437,9 +440,6 @@ #define dma_cache_inv(_start,_size)		do 
 #define dma_cache_wback(_start,_size)		do { } while (0)
 #define dma_cache_wback_inv(_start,_size)	do { } while (0)
 
-/* Check of existence of legacy devices */
-extern int check_legacy_ioport(unsigned long base_port);
-
 
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem

-- 
dwmw2

^ permalink raw reply related

* Re: Clock running fast on Mac Mini?
From: Brent Cook @ 2006-04-24 22:06 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <E1FY8kd-0000mH-PO@owl.grunz.lu>

On Monday 24 April 2006 16:37, Mich Lanners wrote:
> Hello all,
>
> I have a problem on my Mac Mini with the kernel clock, which is running
> fast.
>
> In fact, it is gaining 10 ms every 5 seconds, as can be seen here, with
> ntpdate called every 5 seconds:
>

I can confirm this. I can also trigger make's 'File has future modification 
date' warning message when compiling a file on the local filesystem.

 - Brent

^ permalink raw reply

* Clock running fast on Mac Mini?
From: Mich Lanners @ 2006-04-24 21:37 UTC (permalink / raw)
  To: linuxppc-dev

Hello all,

I have a problem on my Mac Mini with the kernel clock, which is running
fast.

In fact, it is gaining 10 ms every 5 seconds, as can be seen here, with
ntpdate called every 5 seconds:

owl:~# while true; do ntpdate -u be.pool.ntp.org; sleep 5; done
24 Apr 23:14:17 ntpdate[2601]: adjust time server 195.47.215.226 offset 0.151197 sec
24 Apr 23:14:22 ntpdate[2607]: adjust time server 195.47.215.226 offset 0.160348 sec
24 Apr 23:14:28 ntpdate[2613]: adjust time server 195.47.215.226 offset 0.170311 sec
[...]
24 Apr 23:17:15 ntpdate[2813]: adjust time server 195.47.215.226 offset 0.489635 sec
24 Apr 23:17:20 ntpdate[2819]: adjust time server 195.47.215.226 offset 0.499386 sec
24 Apr 23:17:25 ntpdate[2825]: step time server 195.47.215.226 offset 0.509342 sec
24 Apr 23:17:31 ntpdate[2831]: adjust time server 195.47.215.226 offset 0.013693 sec
24 Apr 23:17:36 ntpdate[2837]: adjust time server 195.47.215.226 offset 0.021668 sec

and so on.

This is with ntpd stopped, after a fresh boot with /etc/adjtime removed
to make sure hwclock is not falsly tuning the kernel timekeeping (I did
this because I once had a 2.6.16 kernel running, where clock problems
have been reported. I thought that may have falsified ntp's drift file
and/or hwclock's /etc/adjtime).

All this is with a 2.6.15 kernel.

My Alu powerbook with a 2.6.15 kernel does not show this problem.

Anybody know what's up?

Thanks, and cheers

Michel

-------------------------------------------------------------------------
Michel Lanners                 |  " Read Philosophy.  Study Art.
23, Rue Paul Henkes            |    Ask Questions.  Make Mistakes.
L-1710 Luxembourg              |
email   mlan@cpu.lu            |
http://www.cpu.lu/~mlan        |                     Learn Always. "

^ permalink raw reply

* Rebuilding FS MDS 8349 BSP & JFFS2 integration
From: Krawczuk, Victor @ 2006-04-24 21:30 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

Hi,

I am a newbie to Linux but not to embedded PPC. I was hoping someone
could point me to the direction here. I apologize in advance if this is
kindergarten stuff.

I downloaded the MDS-8349 Linux BSP from FreeScale. I was able to burn
the default prebuilt images "uboot" and "jffs2.img" to my MDS-8349 (PB)
board and have uboot tftp the prebuilt "uImage" to my PB. I used
"tftpboot 200000 uImage" to download and "bootm" to boot.  As per
included instructions, I had "setenv bootargs root=/dev/mtdblock1
rootfstype=jffs2 rw console=ttyS0,115200" in uboot, followed by a
"saveenv". The prebuilt Linux (uImage) came up, no problem.

As I need to include USB Host support to the PPC kernel, I used the ltib
tool to reconfigure and rebuild Linux, as well as a matching jffs2 (I
would think). Ltib produced vmlinux.gz.uboot and rootfs.jffs2 for me.
For some reason, rootfs.jffs2 was significantly smaller than the
prebuilt "jffs2.img" (2483184 bytes vs. 4325376 bytes). I'm not sure
why. I then did the following:

bootargs is still set to "root=/dev/mtdblock1 rootfstype=jffs2 rw
console=ttyS0,115200"  in uboot:

Using the same (prebuilt) uboot as before:

- tftpboot 400000 rootfs.jffs2 (rebuilt jffs2)
- erase fe020000 fe5fffff
- cp.b 400000 fe020000 25e3f0

- tftpboot 200000 vmlinux.gz.uboot (rebuilt Linux containing USB Host
stuff)
- bootm

Linux could not boot!  I got the following root FS error:

[stuff deleted...]
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 32768K size 1024 blocksize
loop: loaded (max 8 devices)
eth0: Gianfar Ethernet Controller Version 1.1, 00:04:9f:00:2d:b7
eth0: Running with NAPI disabled
eth0: 64/64 RX/TX BD ring size
eth1: Gianfar Ethernet Controller Version 1.1, 00:04:9f:00:2d:b8
eth1: Running with NAPI disabled
eth1: 64/64 RX/TX BD ring size
i2c /dev entries driver
NET: Registered protocol family 2
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP established hash table entries: 16384 (order: 5, 131072 bytes)
TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
NET: Registered protocol family 1
NET: Registered protocol family 17
Root-NFS: No NFS server available, giving up.
VFS: Unable to mount root fs via NFS, trying floppy.
VFS: Cannot open root device "mtdblock1" or unknown-block(2,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(2,0)
 <0>Rebooting in 180 seconds..

Would anybody happen to know what is going wrong?  How am I supposed to
know the correct "root=" boot option for a rebuilt jffs2 using LTIB, if
that is my problem?

Thanks in advance for any hints,

Victor.

The information contained in this e-mail message is PRIVATE. It may contain confidential information and may be legally privileged. It is intended for the exclusive use of the addressee(s). If you are not the intended recipient, you are hereby notified that any dissemination, distribution or reproduction of this communication is strictly prohibited. If the intended recipient(s) cannot be reached or if a transmission problem has occurred, please notify the sender immediately by return e-mail and destroy all copies of this message. 
Thank you. 

[-- Attachment #2: Type: text/html, Size: 5900 bytes --]

^ permalink raw reply

* Re: 85xx FDT updates?
From: Kumar Gala @ 2006-04-24 21:43 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <1145900516.4251.18.camel@cashmere.sps.mot.com>


On Apr 24, 2006, at 12:41 PM, Jon Loeliger wrote:

> OK, so what is the deal with the 85xx patches
> that have been submitted for more FDT support?
> I get a variety of mixed stories, message and
> vague comments.  I want some facts and direction.
>
> Can anyone verify, validate, refute, deny or
> support any of these statements?  Not trying to
> be mean, point fingers or sound negative here,
> I'm just trying to pinpoint the apparent log-jam.
>
>
> 1. The 85xx patches that Andy submitted, that
>    have been pulled into Kumar's 85xx git tree
>    are going to sit there until further patches
>    are submitted that _remove_all_85xx_support_
>    from the arch/ppc tree as well.
>
> 2. The same patches are going to sit in Kumar's
>    tree until 8560 patches with support for CPM
>    are released and all 85xx ports appear fully
>    and completely in arch/powerpc.  BTW, CPM is
>    waiting on approval of Vitaly's CPM proposal
>    for OF FDT support.
>
> 3. Statements 1 and 2 combined.
>
> 4. The same 85xx patches have just slipped through
>    some mental crack and simply need to be pulled,
>    re-applied, re-drafted, re-submitted.  Oops, sorry.
>    The action item is in _________'s lap.
>
> 5. There was an unresolved comment/issue that
>    FSL still needs to address.  BTW, that issue
>    is ________.
>
> 6. We are waiting on something else and that
>    item is _______.
>
> 7. You know, you really need to address the _other_
>    85xx ports (such as TQMs) as well.  Can you
>    please do something about those as well?
>
> 8. These Linux patches aren't going to be picked
>    up until the corresponding U-Boot patches are
>    picked up as well.

My understanding is that paulus should pull them into powerpc.git.   
He hasn't given me any reason to believe that shouldn't just happen.   
I dont see any reason that we need to have other dependancies for  
code that works.  I'll push on paulus to pull the patches in my tree  
into powerpc.git.

However, I am against removing the arch/ppc support until the u-boot  
patches are picked up.  I think its bad form to give people a kernel  
they can't easily boot.

- k

^ permalink raw reply

* Re: [PATCH] DTC - validation by device_type
From: Paul Nasrat @ 2006-04-24 21:09 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev@ozlabs.org, Jon Loeliger
In-Reply-To: <5E473F0A-C2E2-4ABD-A9FA-E83C4B4CB6C3@kernel.crashing.org>

On Mon, 2006-04-24 at 22:55 +0200, Segher Boessenkool wrote:

> 1) It's hard to see how any (physical) network device would
> not have a "reg" property.  It is not required though.
> 
> 2) "local-mac-address" is not a required property.
> 
> 3) "mac-address" is only required for nodes that have been
> opened before.
> 
> 4) "address-bits" is not required (taken as 48 if absent).
> 
> Please see Annex A (look at the property names, and perhaps
> at the "network" device type).  _Always_ look at Annex A!
> It's normative.

My bad.  Jon, just drop this from your queue then. 

Paul 

^ permalink raw reply

* Re: [PATCH] DTC - validation by device_type
From: Segher Boessenkool @ 2006-04-24 20:55 UTC (permalink / raw)
  To: Paul Nasrat; +Cc: linuxppc-dev@ozlabs.org, Jon Loeliger
In-Reply-To: <1145905793.4082.27.camel@enki.eridu>

>>> +	CHECK_HAVE(net, "reg");
>>> +	CHECK_HAVE(net, "local-mac-address");
>>> +	CHECK_HAVE(net, "mac-address");
>>> +	CHECK_HAVE(net, "address-bits");

1) It's hard to see how any (physical) network device would
not have a "reg" property.  It is not required though.

2) "local-mac-address" is not a required property.

3) "mac-address" is only required for nodes that have been
opened before.

4) "address-bits" is not required (taken as 48 if absent).

Please see Annex A (look at the property names, and perhaps
at the "network" device type).  _Always_ look at Annex A!
It's normative.

Segher

^ permalink raw reply

* [PATCH 7/7] Print out debugging information during initialisation
From: Mel Gorman @ 2006-04-24 20:22 UTC (permalink / raw)
  To: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


The zone and hole sizing code is new and unexpected problems showed up
on machines that were not covered by the pre-release tests. This patch
prints out useful information when those unexpected situations occur.

It is not expected that this patch become a permanent part of the set.


 mem_init.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 50 insertions(+), 4 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-106-breakout_mem_init/mm/mem_init.c linux-2.6.17-rc2-107-debug/mm/mem_init.c
--- linux-2.6.17-rc2-106-breakout_mem_init/mm/mem_init.c	2006-04-24 15:53:22.000000000 +0100
+++ linux-2.6.17-rc2-107-debug/mm/mem_init.c	2006-04-24 15:54:11.000000000 +0100
@@ -593,6 +593,7 @@ static __meminit void init_currently_emp
 }
 
 #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+
 /* Note: nid == MAX_NUMNODES returns first region */
 static int __init first_active_region_index_in_nid(int nid)
 {
@@ -645,13 +646,24 @@ void __init free_bootmem_with_active_reg
 	for_each_active_range_index_in_nid(i, nid) {
 		unsigned long size_pages = 0;
 		unsigned long end_pfn = early_node_map[i].end_pfn;
-		if (early_node_map[i].start_pfn >= max_low_pfn)
+		if (early_node_map[i].start_pfn >= max_low_pfn) {
+			printk("start_pfn %lu >= %lu\n", early_node_map[i].start_pfn,
+								max_low_pfn);
 			continue;
+		}
 
-		if (end_pfn > max_low_pfn)
+		if (end_pfn > max_low_pfn) {
+			printk("end_pfn %lu going back to %lu\n", early_node_map[i].end_pfn,
+									max_low_pfn);
 			end_pfn = max_low_pfn;
+		}
 
 		size_pages = end_pfn - early_node_map[i].start_pfn;
+		printk("free_bootmem_node(%d, %lu, %lu) :::: pfn ranges (%d, %lu, %lu)\n",
+				early_node_map[i].nid,
+				PFN_PHYS(early_node_map[i].start_pfn),
+				PFN_PHYS(size_pages),
+				early_node_map[i].nid, early_node_map[i].start_pfn, end_pfn);
 		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
 				PFN_PHYS(early_node_map[i].start_pfn),
 				PFN_PHYS(size_pages));
@@ -661,10 +673,15 @@ void __init free_bootmem_with_active_reg
 void __init sparse_memory_present_with_active_regions(int nid)
 {
 	unsigned int i;
-	for_each_active_range_index_in_nid(i, nid)
+	for_each_active_range_index_in_nid(i, nid) {
+		printk("memory_present(%d, %lu, %lu)\n",
+			early_node_map[i].nid,
+			early_node_map[i].start_pfn,
+			early_node_map[i].end_pfn);
 		memory_present(early_node_map[i].nid,
 				early_node_map[i].start_pfn,
 				early_node_map[i].end_pfn);
+	}
 }
 
 void __init get_pfn_range_for_nid(unsigned int nid,
@@ -722,6 +739,8 @@ unsigned long __init __absent_pages_in_r
 	unsigned long prev_end_pfn = 0, hole_pages = 0;
 	unsigned long start_pfn;
 
+	printk("__absent_pages_in_range(%d, %lu, %lu) = ", nid,
+					range_start_pfn, range_end_pfn);
 	/* Find the end_pfn of the first active range of pfns in the node */
 	i = first_active_region_index_in_nid(nid);
 	prev_end_pfn = early_node_map[i].start_pfn;
@@ -749,6 +768,8 @@ unsigned long __init __absent_pages_in_r
 		prev_end_pfn = early_node_map[i].end_pfn;
 	}
 
+	printk("%lu\n", hole_pages);
+
 	return hole_pages;
 }
 
@@ -911,6 +932,9 @@ void __init add_active_range(unsigned in
 {
 	unsigned int i;
 
+	printk("add_active_range(%d, %lu, %lu): ",
+			nid, start_pfn, end_pfn);
+
 	/* Merge with existing active regions if possible */
 	for (i = 0; early_node_map[i].end_pfn; i++) {
 		if (early_node_map[i].nid != nid)
@@ -918,12 +942,15 @@ void __init add_active_range(unsigned in
 
 		/* Skip if an existing region covers this new one */
 		if (start_pfn >= early_node_map[i].start_pfn &&
-				end_pfn <= early_node_map[i].end_pfn)
+				end_pfn <= early_node_map[i].end_pfn) {
+			printk("Existing\n");
 			return;
+		}
 
 		/* Merge forward if suitable */
 		if (start_pfn <= early_node_map[i].end_pfn &&
 				end_pfn > early_node_map[i].end_pfn) {
+			printk("Merging forward\n");
 			early_node_map[i].end_pfn = end_pfn;
 			return;
 		}
@@ -931,6 +958,7 @@ void __init add_active_range(unsigned in
 		/* Merge backward if suitable */
 		if (start_pfn < early_node_map[i].end_pfn &&
 				end_pfn >= early_node_map[i].start_pfn) {
+			printk("Merging backwards\n");
 			early_node_map[i].start_pfn = start_pfn;
 			return;
 		}
@@ -942,6 +970,7 @@ void __init add_active_range(unsigned in
 		return;
 	}
 
+	printk("New\n");
 	early_node_map[i].nid = nid;
 	early_node_map[i].start_pfn = start_pfn;
 	early_node_map[i].end_pfn = end_pfn;
@@ -949,6 +978,7 @@ void __init add_active_range(unsigned in
 
 void __init remove_all_active_ranges()
 {
+	printk("remove_all_active_ranges()\n");
 	memset(early_node_map, 0, sizeof(early_node_map));
 }
 
@@ -976,6 +1006,14 @@ static void __init sort_node_map(void)
 
 	sort(early_node_map, num, sizeof(struct node_active_region),
 						cmp_node_active_region, NULL);
+
+	printk("Dumping sorted node map\n");
+	for (num = 0; early_node_map[num].end_pfn; num++) {
+		printk("entry %lu: %d  %lu -> %lu\n", num,
+				early_node_map[num].nid,
+				early_node_map[num].start_pfn,
+				early_node_map[num].end_pfn);
+	}
 }
 
 /* Find the lowest pfn for a node. This depends on a sorted early_node_map */
@@ -992,6 +1030,7 @@ unsigned long __init find_min_pfn_for_no
 	return 0;
 }
 
+
 unsigned long __init find_min_pfn_with_active_regions(void)
 {
 	return find_min_pfn_for_node(MAX_NUMNODES);
@@ -1005,6 +1044,7 @@ unsigned long __init find_max_pfn_with_a
 	for (i = 0; early_node_map[i].end_pfn; i++)
 		max_pfn = max(max_pfn, early_node_map[i].end_pfn);
 
+	printk("find_max_pfn_with_active_regions() == %lu\n", max_pfn);
 	return max_pfn;
 }
 
@@ -1016,6 +1056,10 @@ void __init free_area_init_nodes(unsigne
 	unsigned long nid;
 	int zone_index;
 
+	printk("free_area_init_nodes(%lu, %lu, %lu, %lu)\n",
+			arch_max_dma_pfn, arch_max_dma32_pfn,
+			arch_max_low_pfn, arch_max_high_pfn);
+
 	/* Record where the zone boundaries are */
 	memset(arch_zone_lowest_possible_pfn, 0,
 				sizeof(arch_zone_lowest_possible_pfn));
@@ -1032,6 +1076,8 @@ void __init free_area_init_nodes(unsigne
 			arch_zone_highest_possible_pfn[zone_index-1];
 	}
 
+	printk("free_area_init_nodes(): find_min_pfn = %lu\n", find_min_pfn_with_active_regions());
+
 	/* Regions in the early_node_map can be in any order */
 	sort_node_map();
 

^ permalink raw reply

* [PATCH 6/7] Break out memory initialisation code from page_alloc.c to mem_init.c
From: Mel Gorman @ 2006-04-24 20:22 UTC (permalink / raw)
  To: davej, tony.luck, linuxppc-dev, linux-kernel, bob.picco, ak,
	linux-mm
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


page_alloc.c contains a large amount of memory initialisation code. This patch
breaks out the initialisation code to a separate file to make page_alloc.c
a bit easier to read.


 Makefile     |    2 
 mem_init.c   | 1044 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 page_alloc.c | 1025 -----------------------------------------------------
 3 files changed, 1045 insertions(+), 1026 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/Makefile linux-2.6.17-rc2-106-breakout_mem_init/mm/Makefile
--- linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/Makefile	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-106-breakout_mem_init/mm/Makefile	2006-04-24 15:53:22.000000000 +0100
@@ -8,7 +8,7 @@ mmu-$(CONFIG_MMU)	:= fremap.o highmem.o 
 			   vmalloc.o
 
 obj-y			:= bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
-			   page_alloc.o page-writeback.o pdflush.o \
+			   page_alloc.o mem_init.o page-writeback.o pdflush.o \
 			   readahead.o swap.o truncate.o vmscan.o \
 			   prio_tree.o util.o mmzone.o $(mmu-y)
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/mem_init.c linux-2.6.17-rc2-106-breakout_mem_init/mm/mem_init.c
--- linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/mem_init.c	2006-04-24 09:36:35.000000000 +0100
+++ linux-2.6.17-rc2-106-breakout_mem_init/mm/mem_init.c	2006-04-24 15:53:22.000000000 +0100
@@ -0,0 +1,1044 @@
+/*
+ * mm/mem_init.c
+ * Initialises the architecture independant view of memory. pgdats, zones, etc
+ *
+ *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
+ *  Copyright (C) 1995, Stephen Tweedie
+ *  Copyright (C) July 1999, Gerhard Wichert, Siemens AG
+ *  Copyright (C) 1999, Ingo Molnar, Red Hat
+ *  Copyright (C) 1999, 2000, Kanoj Sarcar, SGI
+ *  Copyright (C) Sept 2000, Martin J. Bligh
+ *	(lots of bits borrowed from Ingo Molnar & Andrew Morton)
+ *  Copyright (C) Apr 2006, Mel Gorman, IBM
+ *	(lots of bits taken from architecture-specific code)
+ */
+#include <linux/config.h>
+#include <linux/sort.h>
+#include <linux/pfn.h>
+#include <linux/mm.h>
+#include <linux/bootmem.h>
+#include <linux/cpuset.h>
+#include <linux/mempolicy.h>
+#include <linux/sysctl.h>
+#include <linux/swap.h>
+#include <linux/cpu.h>
+
+static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
+int percpu_pagelist_fraction;
+
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+  #ifdef CONFIG_MAX_ACTIVE_REGIONS
+    #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS
+  #else
+    #define MAX_ACTIVE_REGIONS (MAX_NR_ZONES * MAX_NUMNODES + 1)
+  #endif
+
+  struct node_active_region __initdata early_node_map[MAX_ACTIVE_REGIONS];
+  unsigned long __initdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
+  unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
+/*
+ * Builds allocation fallback zone lists.
+ *
+ * Add all populated zones of a node to the zonelist.
+ */
+static int __init build_zonelists_node(pg_data_t *pgdat,
+			struct zonelist *zonelist, int nr_zones, int zone_type)
+{
+	struct zone *zone;
+
+	BUG_ON(zone_type > ZONE_HIGHMEM);
+
+	do {
+		zone = pgdat->node_zones + zone_type;
+		if (populated_zone(zone)) {
+#ifndef CONFIG_HIGHMEM
+			BUG_ON(zone_type > ZONE_NORMAL);
+#endif
+			zonelist->zones[nr_zones++] = zone;
+			check_highest_zone(zone_type);
+		}
+		zone_type--;
+
+	} while (zone_type >= 0);
+	return nr_zones;
+}
+
+static inline int highest_zone(int zone_bits)
+{
+	int res = ZONE_NORMAL;
+	if (zone_bits & (__force int)__GFP_HIGHMEM)
+		res = ZONE_HIGHMEM;
+	if (zone_bits & (__force int)__GFP_DMA32)
+		res = ZONE_DMA32;
+	if (zone_bits & (__force int)__GFP_DMA)
+		res = ZONE_DMA;
+	return res;
+}
+
+#ifdef CONFIG_NUMA
+#define MAX_NODE_LOAD (num_online_nodes())
+static int __initdata node_load[MAX_NUMNODES];
+/**
+ * find_next_best_node - find the next node that should appear in a given node's fallback list
+ * @node: node whose fallback list we're appending
+ * @used_node_mask: nodemask_t of already used nodes
+ *
+ * We use a number of factors to determine which is the next node that should
+ * appear on a given node's fallback list.  The node should not have appeared
+ * already in @node's fallback list, and it should be the next closest node
+ * according to the distance array (which contains arbitrary distance values
+ * from each node to each node in the system), and should also prefer nodes
+ * with no CPUs, since presumably they'll have very little allocation pressure
+ * on them otherwise.
+ * It returns -1 if no node is found.
+ */
+static int __init find_next_best_node(int node, nodemask_t *used_node_mask)
+{
+	int n, val;
+	int min_val = INT_MAX;
+	int best_node = -1;
+
+	/* Use the local node if we haven't already */
+	if (!node_isset(node, *used_node_mask)) {
+		node_set(node, *used_node_mask);
+		return node;
+	}
+
+	for_each_online_node(n) {
+		cpumask_t tmp;
+
+		/* Don't want a node to appear more than once */
+		if (node_isset(n, *used_node_mask))
+			continue;
+
+		/* Use the distance array to find the distance */
+		val = node_distance(node, n);
+
+		/* Penalize nodes under us ("prefer the next node") */
+		val += (n < node);
+
+		/* Give preference to headless and unused nodes */
+		tmp = node_to_cpumask(n);
+		if (!cpus_empty(tmp))
+			val += PENALTY_FOR_NODE_WITH_CPUS;
+
+		/* Slight preference for less loaded node */
+		val *= (MAX_NODE_LOAD*MAX_NUMNODES);
+		val += node_load[n];
+
+		if (val < min_val) {
+			min_val = val;
+			best_node = n;
+		}
+	}
+
+	if (best_node >= 0)
+		node_set(best_node, *used_node_mask);
+
+	return best_node;
+}
+
+static void __init build_zonelists(pg_data_t *pgdat)
+{
+	int i, j, k, node, local_node;
+	int prev_node, load;
+	struct zonelist *zonelist;
+	nodemask_t used_mask;
+
+	/* initialize zonelists */
+	for (i = 0; i < GFP_ZONETYPES; i++) {
+		zonelist = pgdat->node_zonelists + i;
+		zonelist->zones[0] = NULL;
+	}
+
+	/* NUMA-aware ordering of nodes */
+	local_node = pgdat->node_id;
+	load = num_online_nodes();
+	prev_node = local_node;
+	nodes_clear(used_mask);
+	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
+		int distance = node_distance(local_node, node);
+
+		/*
+		 * If another node is sufficiently far away then it is better
+		 * to reclaim pages in a zone before going off node.
+		 */
+		if (distance > RECLAIM_DISTANCE)
+			zone_reclaim_mode = 1;
+
+		/*
+		 * We don't want to pressure a particular node.
+		 * So adding penalty to the first node in same
+		 * distance group to make it round-robin.
+		 */
+
+		if (distance != node_distance(local_node, prev_node))
+			node_load[node] += load;
+		prev_node = node;
+		load--;
+		for (i = 0; i < GFP_ZONETYPES; i++) {
+			zonelist = pgdat->node_zonelists + i;
+			for (j = 0; zonelist->zones[j] != NULL; j++);
+
+			k = highest_zone(i);
+
+	 		j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+			zonelist->zones[j] = NULL;
+		}
+	}
+}
+
+#else	/* CONFIG_NUMA */
+
+static void __init build_zonelists(pg_data_t *pgdat)
+{
+	int i, j, k, node, local_node;
+
+	local_node = pgdat->node_id;
+	for (i = 0; i < GFP_ZONETYPES; i++) {
+		struct zonelist *zonelist;
+
+		zonelist = pgdat->node_zonelists + i;
+
+		j = 0;
+		k = highest_zone(i);
+ 		j = build_zonelists_node(pgdat, zonelist, j, k);
+ 		/*
+ 		 * Now we build the zonelist so that it contains the zones
+ 		 * of all the other nodes.
+ 		 * We don't want to pressure a particular node, so when
+ 		 * building the zones for node N, we make sure that the
+ 		 * zones coming right after the local ones are those from
+ 		 * node N+1 (modulo N)
+ 		 */
+		for (node = local_node + 1; node < MAX_NUMNODES; node++) {
+			if (!node_online(node))
+				continue;
+			j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+		}
+		for (node = 0; node < local_node; node++) {
+			if (!node_online(node))
+				continue;
+			j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
+		}
+
+		zonelist->zones[j] = NULL;
+	}
+}
+
+#endif	/* CONFIG_NUMA */
+
+void __init build_all_zonelists(void)
+{
+	int i;
+
+	for_each_online_node(i)
+		build_zonelists(NODE_DATA(i));
+	printk("Built %i zonelists\n", num_online_nodes());
+	cpuset_init_current_mems_allowed();
+}
+
+/*
+ * Helper functions to size the waitqueue hash table.
+ * Essentially these want to choose hash table sizes sufficiently
+ * large so that collisions trying to wait on pages are rare.
+ * But in fact, the number of active page waitqueues on typical
+ * systems is ridiculously low, less than 200. So this is even
+ * conservative, even though it seems large.
+ *
+ * The constant PAGES_PER_WAITQUEUE specifies the ratio of pages to
+ * waitqueues, i.e. the size of the waitq table given the number of pages.
+ */
+#define PAGES_PER_WAITQUEUE	256
+
+static inline unsigned long wait_table_size(unsigned long pages)
+{
+	unsigned long size = 1;
+
+	pages /= PAGES_PER_WAITQUEUE;
+
+	while (size < pages)
+		size <<= 1;
+
+	/*
+	 * Once we have dozens or even hundreds of threads sleeping
+	 * on IO we've got bigger problems than wait queue collision.
+	 * Limit the size of the wait table to a reasonable size.
+	 */
+	size = min(size, 4096UL);
+
+	return max(size, 4UL);
+}
+
+/*
+ * This is an integer logarithm so that shifts can be used later
+ * to extract the more random high bits from the multiplicative
+ * hash function before the remainder is taken.
+ */
+static inline unsigned long wait_table_bits(unsigned long size)
+{
+	return ffz(~size);
+}
+
+#define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
+
+#ifndef __HAVE_ARCH_MEMMAP_INIT
+#define memmap_init(size, nid, zone, start_pfn) \
+	memmap_init_zone((size), (nid), (zone), (start_pfn))
+#endif
+
+/*
+ * Initially all pages are reserved - free ones are freed
+ * up by free_all_bootmem() once the early boot process is
+ * done. Non-atomic initialization, single-pass.
+ */
+void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
+		unsigned long start_pfn)
+{
+	struct page *page;
+	unsigned long end_pfn = start_pfn + size;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!early_pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		set_page_links(page, zone, nid, pfn);
+		init_page_count(page);
+		reset_page_mapcount(page);
+		SetPageReserved(page);
+		INIT_LIST_HEAD(&page->lru);
+#ifdef WANT_PAGE_VIRTUAL
+		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
+		if (!is_highmem_idx(zone))
+			set_page_address(page, __va(pfn << PAGE_SHIFT));
+#endif
+	}
+}
+
+void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone,
+				unsigned long size)
+{
+	int order;
+	for (order = 0; order < MAX_ORDER ; order++) {
+		INIT_LIST_HEAD(&zone->free_area[order].free_list);
+		zone->free_area[order].nr_free = 0;
+	}
+}
+
+#define ZONETABLE_INDEX(x, zone_nr)	((x << ZONES_SHIFT) | zone_nr)
+void zonetable_add(struct zone *zone, int nid, int zid, unsigned long pfn,
+		unsigned long size)
+{
+	unsigned long snum = pfn_to_section_nr(pfn);
+	unsigned long end = pfn_to_section_nr(pfn + size);
+
+	if (FLAGS_HAS_NODE)
+		zone_table[ZONETABLE_INDEX(nid, zid)] = zone;
+	else
+		for (; snum <= end; snum++)
+			zone_table[ZONETABLE_INDEX(snum, zid)] = zone;
+}
+
+static __meminit
+void zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
+{
+	int i;
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	/*
+	 * The per-page waitqueue mechanism uses hashed waitqueues
+	 * per zone.
+	 */
+	zone->wait_table_size = wait_table_size(zone_size_pages);
+	zone->wait_table_bits =	wait_table_bits(zone->wait_table_size);
+	zone->wait_table = (wait_queue_head_t *)
+		alloc_bootmem_node(pgdat, zone->wait_table_size
+					* sizeof(wait_queue_head_t));
+
+	for(i = 0; i < zone->wait_table_size; ++i)
+		init_waitqueue_head(zone->wait_table + i);
+}
+
+/*
+ * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
+ * to the value high for the pageset p.
+ */
+static void setup_pagelist_highmark(struct per_cpu_pageset *p,
+				unsigned long high)
+{
+	struct per_cpu_pages *pcp;
+
+	pcp = &p->pcp[0]; /* hot list */
+	pcp->high = high;
+	pcp->batch = max(1UL, high/4);
+	if ((high/4) > (PAGE_SHIFT * 8))
+		pcp->batch = PAGE_SHIFT * 8;
+}
+
+/*
+ * percpu_pagelist_fraction - changes the pcp->high for each zone on each
+ * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
+ * can have before it gets flushed back to buddy allocator.
+ */
+int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
+	struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
+{
+	struct zone *zone;
+	unsigned int cpu;
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
+	if (!write || (ret == -EINVAL))
+		return ret;
+	for_each_zone(zone) {
+		for_each_online_cpu(cpu) {
+			unsigned long  high;
+			high = zone->present_pages / percpu_pagelist_fraction;
+			setup_pagelist_highmark(zone_pcp(zone, cpu), high);
+		}
+	}
+	return 0;
+}
+
+static int __cpuinit zone_batchsize(struct zone *zone)
+{
+	int batch;
+
+	/*
+	 * The per-cpu-pages pools are set to around 1000th of the
+	 * size of the zone.  But no more than 1/2 of a meg.
+	 *
+	 * OK, so we don't know how big the cache is.  So guess.
+	 */
+	batch = zone->present_pages / 1024;
+	if (batch * PAGE_SIZE > 512 * 1024)
+		batch = (512 * 1024) / PAGE_SIZE;
+	batch /= 4;		/* We effectively *= 4 below */
+	if (batch < 1)
+		batch = 1;
+
+	/*
+	 * Clamp the batch to a 2^n - 1 value. Having a power
+	 * of 2 value was found to be more likely to have
+	 * suboptimal cache aliasing properties in some cases.
+	 *
+	 * For example if 2 tasks are alternately allocating
+	 * batches of pages, one task can end up with a lot
+	 * of pages of one half of the possible page colors
+	 * and the other with pages of the other colors.
+	 */
+	batch = (1 << (fls(batch + batch/2)-1)) - 1;
+
+	return batch;
+}
+
+inline void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+{
+	struct per_cpu_pages *pcp;
+
+	memset(p, 0, sizeof(*p));
+
+	pcp = &p->pcp[0];		/* hot */
+	pcp->count = 0;
+	pcp->high = 6 * batch;
+	pcp->batch = max(1UL, 1 * batch);
+	INIT_LIST_HEAD(&pcp->list);
+
+	pcp = &p->pcp[1];		/* cold*/
+	pcp->count = 0;
+	pcp->high = 2 * batch;
+	pcp->batch = max(1UL, batch/2);
+	INIT_LIST_HEAD(&pcp->list);
+}
+
+#ifdef CONFIG_NUMA
+/*
+ * Boot pageset table. One per cpu which is going to be used for all
+ * zones and all nodes. The parameters will be set in such a way
+ * that an item put on a list will immediately be handed over to
+ * the buddy list. This is safe since pageset manipulation is done
+ * with interrupts disabled.
+ *
+ * Some NUMA counter updates may also be caught by the boot pagesets.
+ *
+ * The boot_pagesets must be kept even after bootup is complete for
+ * unused processors and/or zones. They do play a role for bootstrapping
+ * hotplugged processors.
+ *
+ * zoneinfo_show() and maybe other functions do
+ * not check if the processor is online before following the pageset pointer.
+ * Other parts of the kernel may not check if the zone is available.
+ */
+static struct per_cpu_pageset boot_pageset[NR_CPUS];
+
+/*
+ * Dynamically allocate memory for the
+ * per cpu pageset array in struct zone.
+ */
+static int __cpuinit process_zones(int cpu)
+{
+	struct zone *zone, *dzone;
+
+	for_each_zone(zone) {
+
+		zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
+					 GFP_KERNEL, cpu_to_node(cpu));
+		if (!zone_pcp(zone, cpu))
+			goto bad;
+
+		setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
+
+		if (percpu_pagelist_fraction)
+			setup_pagelist_highmark(zone_pcp(zone, cpu),
+			 	(zone->present_pages / percpu_pagelist_fraction));
+	}
+
+	return 0;
+bad:
+	for_each_zone(dzone) {
+		if (dzone == zone)
+			break;
+		kfree(zone_pcp(dzone, cpu));
+		zone_pcp(dzone, cpu) = NULL;
+	}
+	return -ENOMEM;
+}
+
+static inline void free_zone_pagesets(int cpu)
+{
+	struct zone *zone;
+
+	for_each_zone(zone) {
+		struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
+
+		zone_pcp(zone, cpu) = NULL;
+		kfree(pset);
+	}
+}
+
+static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
+		unsigned long action,
+		void *hcpu)
+{
+	int cpu = (long)hcpu;
+	int ret = NOTIFY_OK;
+
+	switch (action) {
+		case CPU_UP_PREPARE:
+			if (process_zones(cpu))
+				ret = NOTIFY_BAD;
+			break;
+		case CPU_UP_CANCELED:
+		case CPU_DEAD:
+			free_zone_pagesets(cpu);
+			break;
+		default:
+			break;
+	}
+	return ret;
+}
+
+static struct notifier_block pageset_notifier =
+	{ &pageset_cpuup_callback, NULL, 0 };
+
+void __init setup_per_cpu_pageset(void)
+{
+	int err;
+
+	/* Initialize per_cpu_pageset for cpu 0.
+	 * A cpuup callback will do this for every cpu
+	 * as it comes online
+	 */
+	err = process_zones(smp_processor_id());
+	BUG_ON(err);
+	register_cpu_notifier(&pageset_notifier);
+}
+#endif
+
+static __meminit void zone_pcp_init(struct zone *zone)
+{
+	int cpu;
+	unsigned long batch = zone_batchsize(zone);
+
+	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+#ifdef CONFIG_NUMA
+		/* Early boot. Slab allocator not functional yet */
+		zone_pcp(zone, cpu) = &boot_pageset[cpu];
+		setup_pageset(&boot_pageset[cpu],0);
+#else
+		setup_pageset(zone_pcp(zone,cpu), batch);
+#endif
+	}
+	if (zone->present_pages)
+		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
+			zone->name, zone->present_pages, batch);
+}
+
+static __meminit void init_currently_empty_zone(struct zone *zone,
+		unsigned long zone_start_pfn, unsigned long size)
+{
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	zone_wait_table_init(zone, size);
+	pgdat->nr_zones = zone_idx(zone) + 1;
+
+	zone->zone_start_pfn = zone_start_pfn;
+
+	memmap_init(size, pgdat->node_id, zone_idx(zone), zone_start_pfn);
+
+	zone_init_free_lists(pgdat, zone, zone->spanned_pages);
+}
+
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/* Note: nid == MAX_NUMNODES returns first region */
+static int __init first_active_region_index_in_nid(int nid)
+{
+	int i;
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		if (nid == MAX_NUMNODES || early_node_map[i].nid == nid)
+			return i;
+	}
+
+	return MAX_ACTIVE_REGIONS;
+}
+
+/* Note: nid == MAX_NUMNODES returns next region */
+static int __init next_active_region_index_in_nid(unsigned int index, int nid)
+{
+	for (index = index + 1; early_node_map[index].end_pfn; index++) {
+		if (nid == MAX_NUMNODES || early_node_map[index].nid == nid)
+			return index;
+	}
+
+	return MAX_ACTIVE_REGIONS;
+}
+
+#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+int __init early_pfn_to_nid(unsigned long pfn)
+{
+	int i;
+
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		unsigned long start_pfn = early_node_map[i].start_pfn;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+
+		if ((start_pfn <= pfn) && (pfn < end_pfn))
+			return early_node_map[i].nid;
+	}
+
+	return -1;
+}
+#endif
+
+#define for_each_active_range_index_in_nid(i, nid) \
+	for (i = first_active_region_index_in_nid(nid); \
+				i != MAX_ACTIVE_REGIONS; \
+				i = next_active_region_index_in_nid(i, nid))
+
+void __init free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn)
+{
+	unsigned int i;
+	for_each_active_range_index_in_nid(i, nid) {
+		unsigned long size_pages = 0;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+		if (early_node_map[i].start_pfn >= max_low_pfn)
+			continue;
+
+		if (end_pfn > max_low_pfn)
+			end_pfn = max_low_pfn;
+
+		size_pages = end_pfn - early_node_map[i].start_pfn;
+		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
+				PFN_PHYS(early_node_map[i].start_pfn),
+				PFN_PHYS(size_pages));
+	}
+}
+
+void __init sparse_memory_present_with_active_regions(int nid)
+{
+	unsigned int i;
+	for_each_active_range_index_in_nid(i, nid)
+		memory_present(early_node_map[i].nid,
+				early_node_map[i].start_pfn,
+				early_node_map[i].end_pfn);
+}
+
+void __init get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn)
+{
+	unsigned int i;
+	*start_pfn = -1UL;
+	*end_pfn = 0;
+
+	for_each_active_range_index_in_nid(i, nid) {
+		if (early_node_map[i].start_pfn < *start_pfn)
+			*start_pfn = early_node_map[i].start_pfn;
+
+		if (early_node_map[i].end_pfn > *end_pfn)
+			*end_pfn = early_node_map[i].end_pfn;
+	}
+
+	if (*start_pfn == -1UL) {
+		printk(KERN_WARNING "Node %u active with no memory\n", nid);
+		*start_pfn = 0;
+	}
+}
+
+unsigned long __init zone_present_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	unsigned long node_start_pfn, node_end_pfn;
+	unsigned long zone_start_pfn, zone_end_pfn;
+
+	/* Get the start and end of the node and zone */
+	get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
+	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
+	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+
+	/* Check that this node has pages within the zone's required range */
+	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
+		return 0;
+
+	/* Move the zone boundaries inside the node if necessary */
+	if (zone_end_pfn > node_end_pfn)
+		zone_end_pfn = node_end_pfn;
+	if (zone_start_pfn < node_start_pfn)
+		zone_start_pfn = node_start_pfn;
+
+	/* Return the spanned pages */
+	return zone_end_pfn - zone_start_pfn;
+}
+
+unsigned long __init __absent_pages_in_range(int nid,
+				unsigned long range_start_pfn,
+				unsigned long range_end_pfn)
+{
+	int i = 0;
+	unsigned long prev_end_pfn = 0, hole_pages = 0;
+	unsigned long start_pfn;
+
+	/* Find the end_pfn of the first active range of pfns in the node */
+	i = first_active_region_index_in_nid(nid);
+	prev_end_pfn = early_node_map[i].start_pfn;
+
+	/* Find all holes for the zone within the node */
+	for (; i != MAX_ACTIVE_REGIONS;
+			i = next_active_region_index_in_nid(i, nid)) {
+
+		/* No need to continue if prev_end_pfn is outside the zone */
+		if (prev_end_pfn >= range_end_pfn)
+			break;
+
+		/* Make sure the end of the zone is not within the hole */
+		start_pfn = early_node_map[i].start_pfn;
+		if (start_pfn > range_end_pfn)
+			start_pfn = range_end_pfn;
+		if (prev_end_pfn < range_start_pfn)
+			prev_end_pfn = range_start_pfn;
+
+		/* Update the hole size cound and move on */
+		if (start_pfn > range_start_pfn) {
+			BUG_ON(prev_end_pfn > start_pfn);
+			hole_pages += start_pfn - prev_end_pfn;
+		}
+		prev_end_pfn = early_node_map[i].end_pfn;
+	}
+
+	return hole_pages;
+}
+
+unsigned long __init absent_pages_in_range(unsigned long start_pfn,
+							unsigned long end_pfn)
+{
+	return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn);
+}
+
+unsigned long __init zone_absent_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	return __absent_pages_in_range(nid,
+				arch_zone_lowest_possible_pfn[zone_type],
+				arch_zone_highest_possible_pfn[zone_type]);
+}
+#else
+static inline unsigned long zone_present_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *zones_size)
+{
+	return zones_size[zone_type];
+}
+
+static inline unsigned long zone_absent_pages_in_node(int nid,
+						unsigned long zone_type,
+						unsigned long *zholes_size)
+{
+	if (!zholes_size)
+		return 0;
+
+	return zholes_size[zone_type];
+}
+#endif
+
+static void __init calculate_node_totalpages(struct pglist_data *pgdat,
+		unsigned long *zones_size, unsigned long *zholes_size)
+{
+	unsigned long realtotalpages, totalpages = 0;
+	int i;
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		totalpages += zone_present_pages_in_node(pgdat->node_id, i,
+								zones_size);
+	}
+	pgdat->node_spanned_pages = totalpages;
+
+	realtotalpages = totalpages;
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		realtotalpages -=
+			zone_absent_pages_in_node(pgdat->node_id, i, zholes_size);
+	}
+	pgdat->node_present_pages = realtotalpages;
+	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id,
+							realtotalpages);
+}
+
+/*
+ * Set up the zone data structures:
+ *   - mark all pages reserved
+ *   - mark all memory queues empty
+ *   - clear the memory bitmaps
+ */
+static void __init free_area_init_core(struct pglist_data *pgdat,
+		unsigned long *zones_size, unsigned long *zholes_size)
+{
+	unsigned long j;
+	int nid = pgdat->node_id;
+	unsigned long zone_start_pfn = pgdat->node_start_pfn;
+
+	pgdat_resize_init(pgdat);
+	pgdat->nr_zones = 0;
+	init_waitqueue_head(&pgdat->kswapd_wait);
+	pgdat->kswapd_max_order = 0;
+
+	for (j = 0; j < MAX_NR_ZONES; j++) {
+		struct zone *zone = pgdat->node_zones + j;
+		unsigned long size, realsize;
+
+		size = zone_present_pages_in_node(nid, j, zones_size);
+		realsize = size - zone_absent_pages_in_node(nid, j,
+								zholes_size);
+		if (j < ZONE_HIGHMEM)
+			nr_kernel_pages += realsize;
+		nr_all_pages += realsize;
+
+		zone->spanned_pages = size;
+		zone->present_pages = realsize;
+		zone->name = zone_names[j];
+		spin_lock_init(&zone->lock);
+		spin_lock_init(&zone->lru_lock);
+		zone_seqlock_init(zone);
+		zone->zone_pgdat = pgdat;
+		zone->free_pages = 0;
+
+		zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
+
+		zone_pcp_init(zone);
+		INIT_LIST_HEAD(&zone->active_list);
+		INIT_LIST_HEAD(&zone->inactive_list);
+		zone->nr_scan_active = 0;
+		zone->nr_scan_inactive = 0;
+		zone->nr_active = 0;
+		zone->nr_inactive = 0;
+		atomic_set(&zone->reclaim_in_progress, 0);
+		if (!size)
+			continue;
+
+		zonetable_add(zone, nid, j, zone_start_pfn, size);
+		init_currently_empty_zone(zone, zone_start_pfn, size);
+		zone_start_pfn += size;
+	}
+}
+
+static void __init alloc_node_mem_map(struct pglist_data *pgdat)
+{
+	/* Skip empty nodes */
+	if (!pgdat->node_spanned_pages)
+		return;
+
+#ifdef CONFIG_FLAT_NODE_MEM_MAP
+	/* ia64 gets its own node_mem_map, before this, without bootmem */
+	if (!pgdat->node_mem_map) {
+		unsigned long size;
+		struct page *map;
+
+		size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+		map = alloc_remap(pgdat->node_id, size);
+		if (!map)
+			map = alloc_bootmem_node(pgdat, size);
+		pgdat->node_mem_map = map;
+	}
+#ifdef CONFIG_FLATMEM
+	/*
+	 * With no DISCONTIG, the global mem_map is just set as node 0's
+	 */
+	if (pgdat == NODE_DATA(0))
+		mem_map = NODE_DATA(0)->node_mem_map;
+#endif
+#endif /* CONFIG_FLAT_NODE_MEM_MAP */
+}
+
+void __init free_area_init_node(int nid, struct pglist_data *pgdat,
+		unsigned long *zones_size, unsigned long node_start_pfn,
+		unsigned long *zholes_size)
+{
+	pgdat->node_id = nid;
+	pgdat->node_start_pfn = node_start_pfn;
+	calculate_node_totalpages(pgdat, zones_size, zholes_size);
+
+	alloc_node_mem_map(pgdat);
+
+	free_area_init_core(pgdat, zones_size, zholes_size);
+}
+
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+void __init add_active_range(unsigned int nid, unsigned long start_pfn,
+						unsigned long end_pfn)
+{
+	unsigned int i;
+
+	/* Merge with existing active regions if possible */
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		if (early_node_map[i].nid != nid)
+			continue;
+
+		/* Skip if an existing region covers this new one */
+		if (start_pfn >= early_node_map[i].start_pfn &&
+				end_pfn <= early_node_map[i].end_pfn)
+			return;
+
+		/* Merge forward if suitable */
+		if (start_pfn <= early_node_map[i].end_pfn &&
+				end_pfn > early_node_map[i].end_pfn) {
+			early_node_map[i].end_pfn = end_pfn;
+			return;
+		}
+
+		/* Merge backward if suitable */
+		if (start_pfn < early_node_map[i].end_pfn &&
+				end_pfn >= early_node_map[i].start_pfn) {
+			early_node_map[i].start_pfn = start_pfn;
+			return;
+		}
+	}
+
+	/* Leave last entry NULL, we use range.end_pfn to terminate the walk */
+	if (i >= MAX_ACTIVE_REGIONS - 1) {
+		printk(KERN_ERR "Too many memory regions, truncating\n");
+		return;
+	}
+
+	early_node_map[i].nid = nid;
+	early_node_map[i].start_pfn = start_pfn;
+	early_node_map[i].end_pfn = end_pfn;
+}
+
+void __init remove_all_active_ranges()
+{
+	memset(early_node_map, 0, sizeof(early_node_map));
+}
+
+/* Compare two active node_active_regions */
+static int __init cmp_node_active_region(const void *a, const void *b)
+{
+	struct node_active_region *arange = (struct node_active_region *)a;
+	struct node_active_region *brange = (struct node_active_region *)b;
+
+	/* Done this way to avoid overflows */
+	if (arange->start_pfn > brange->start_pfn)
+		return 1;
+	if (arange->start_pfn < brange->start_pfn)
+		return -1;
+
+	return 0;
+}
+
+/* sort the node_map by start_pfn */
+static void __init sort_node_map(void)
+{
+	size_t num = 0;
+	while (early_node_map[num].end_pfn)
+		num++;
+
+	sort(early_node_map, num, sizeof(struct node_active_region),
+						cmp_node_active_region, NULL);
+}
+
+/* Find the lowest pfn for a node. This depends on a sorted early_node_map */
+unsigned long __init find_min_pfn_for_node(unsigned long nid)
+{
+	int i;
+
+	/* Assuming a sorted map, the first range found has the starting pfn */
+	for_each_active_range_index_in_nid(i, nid)
+		return early_node_map[i].start_pfn;
+
+	/* nid does not exist in early_node_map */
+	printk(KERN_WARNING "Could not find start_pfn for node %lu\n", nid);
+	return 0;
+}
+
+unsigned long __init find_min_pfn_with_active_regions(void)
+{
+	return find_min_pfn_for_node(MAX_NUMNODES);
+}
+
+unsigned long __init find_max_pfn_with_active_regions(void)
+{
+	int i;
+	unsigned long max_pfn = 0;
+
+	for (i = 0; early_node_map[i].end_pfn; i++)
+		max_pfn = max(max_pfn, early_node_map[i].end_pfn);
+
+	return max_pfn;
+}
+
+void __init free_area_init_nodes(unsigned long arch_max_dma_pfn,
+				unsigned long arch_max_dma32_pfn,
+				unsigned long arch_max_low_pfn,
+				unsigned long arch_max_high_pfn)
+{
+	unsigned long nid;
+	int zone_index;
+
+	/* Record where the zone boundaries are */
+	memset(arch_zone_lowest_possible_pfn, 0,
+				sizeof(arch_zone_lowest_possible_pfn));
+	memset(arch_zone_highest_possible_pfn, 0,
+				sizeof(arch_zone_highest_possible_pfn));
+	arch_zone_lowest_possible_pfn[ZONE_DMA] =
+					find_min_pfn_with_active_regions();
+	arch_zone_highest_possible_pfn[ZONE_DMA] = arch_max_dma_pfn;
+	arch_zone_highest_possible_pfn[ZONE_DMA32] = arch_max_dma32_pfn;
+	arch_zone_highest_possible_pfn[ZONE_NORMAL] = arch_max_low_pfn;
+	arch_zone_highest_possible_pfn[ZONE_HIGHMEM] = arch_max_high_pfn;
+	for (zone_index = 1; zone_index < MAX_NR_ZONES; zone_index++) {
+		arch_zone_lowest_possible_pfn[zone_index] = 
+			arch_zone_highest_possible_pfn[zone_index-1];
+	}
+
+	/* Regions in the early_node_map can be in any order */
+	sort_node_map();
+
+	for_each_online_node(nid) {
+		pg_data_t *pgdat = NODE_DATA(nid);
+		free_area_init_node(nid, pgdat, NULL,
+				find_min_pfn_for_node(nid), NULL);
+	}
+}
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/page_alloc.c linux-2.6.17-rc2-106-breakout_mem_init/mm/page_alloc.c
--- linux-2.6.17-rc2-105-ia64_use_init_nodes/mm/page_alloc.c	2006-04-24 15:49:25.000000000 +0100
+++ linux-2.6.17-rc2-106-breakout_mem_init/mm/page_alloc.c	2006-04-24 15:53:22.000000000 +0100
@@ -37,8 +37,6 @@
 #include <linux/nodemask.h>
 #include <linux/vmalloc.h>
 #include <linux/mempolicy.h>
-#include <linux/sort.h>
-#include <linux/pfn.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -55,7 +53,6 @@ unsigned long totalram_pages __read_most
 unsigned long totalhigh_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
 long nr_swap_pages;
-int percpu_pagelist_fraction;
 
 static void __free_pages_ok(struct page *page, unsigned int order);
 
@@ -81,24 +78,11 @@ EXPORT_SYMBOL(totalram_pages);
 struct zone *zone_table[1 << ZONETABLE_SHIFT] __read_mostly;
 EXPORT_SYMBOL(zone_table);
 
-static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
 int min_free_kbytes = 1024;
 
 unsigned long __initdata nr_kernel_pages;
 unsigned long __initdata nr_all_pages;
 
-#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
-  #ifdef CONFIG_MAX_ACTIVE_REGIONS
-    #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS
-  #else
-    #define MAX_ACTIVE_REGIONS (MAX_NR_ZONES * MAX_NUMNODES + 1)
-  #endif
-
-  struct node_active_region __initdata early_node_map[MAX_ACTIVE_REGIONS];
-  unsigned long __initdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
-  unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
-#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
-
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1517,989 +1501,6 @@ void show_free_areas(void)
 	show_swap_cache_info();
 }
 
-/*
- * Builds allocation fallback zone lists.
- *
- * Add all populated zones of a node to the zonelist.
- */
-static int __init build_zonelists_node(pg_data_t *pgdat,
-			struct zonelist *zonelist, int nr_zones, int zone_type)
-{
-	struct zone *zone;
-
-	BUG_ON(zone_type > ZONE_HIGHMEM);
-
-	do {
-		zone = pgdat->node_zones + zone_type;
-		if (populated_zone(zone)) {
-#ifndef CONFIG_HIGHMEM
-			BUG_ON(zone_type > ZONE_NORMAL);
-#endif
-			zonelist->zones[nr_zones++] = zone;
-			check_highest_zone(zone_type);
-		}
-		zone_type--;
-
-	} while (zone_type >= 0);
-	return nr_zones;
-}
-
-static inline int highest_zone(int zone_bits)
-{
-	int res = ZONE_NORMAL;
-	if (zone_bits & (__force int)__GFP_HIGHMEM)
-		res = ZONE_HIGHMEM;
-	if (zone_bits & (__force int)__GFP_DMA32)
-		res = ZONE_DMA32;
-	if (zone_bits & (__force int)__GFP_DMA)
-		res = ZONE_DMA;
-	return res;
-}
-
-#ifdef CONFIG_NUMA
-#define MAX_NODE_LOAD (num_online_nodes())
-static int __initdata node_load[MAX_NUMNODES];
-/**
- * find_next_best_node - find the next node that should appear in a given node's fallback list
- * @node: node whose fallback list we're appending
- * @used_node_mask: nodemask_t of already used nodes
- *
- * We use a number of factors to determine which is the next node that should
- * appear on a given node's fallback list.  The node should not have appeared
- * already in @node's fallback list, and it should be the next closest node
- * according to the distance array (which contains arbitrary distance values
- * from each node to each node in the system), and should also prefer nodes
- * with no CPUs, since presumably they'll have very little allocation pressure
- * on them otherwise.
- * It returns -1 if no node is found.
- */
-static int __init find_next_best_node(int node, nodemask_t *used_node_mask)
-{
-	int n, val;
-	int min_val = INT_MAX;
-	int best_node = -1;
-
-	/* Use the local node if we haven't already */
-	if (!node_isset(node, *used_node_mask)) {
-		node_set(node, *used_node_mask);
-		return node;
-	}
-
-	for_each_online_node(n) {
-		cpumask_t tmp;
-
-		/* Don't want a node to appear more than once */
-		if (node_isset(n, *used_node_mask))
-			continue;
-
-		/* Use the distance array to find the distance */
-		val = node_distance(node, n);
-
-		/* Penalize nodes under us ("prefer the next node") */
-		val += (n < node);
-
-		/* Give preference to headless and unused nodes */
-		tmp = node_to_cpumask(n);
-		if (!cpus_empty(tmp))
-			val += PENALTY_FOR_NODE_WITH_CPUS;
-
-		/* Slight preference for less loaded node */
-		val *= (MAX_NODE_LOAD*MAX_NUMNODES);
-		val += node_load[n];
-
-		if (val < min_val) {
-			min_val = val;
-			best_node = n;
-		}
-	}
-
-	if (best_node >= 0)
-		node_set(best_node, *used_node_mask);
-
-	return best_node;
-}
-
-static void __init build_zonelists(pg_data_t *pgdat)
-{
-	int i, j, k, node, local_node;
-	int prev_node, load;
-	struct zonelist *zonelist;
-	nodemask_t used_mask;
-
-	/* initialize zonelists */
-	for (i = 0; i < GFP_ZONETYPES; i++) {
-		zonelist = pgdat->node_zonelists + i;
-		zonelist->zones[0] = NULL;
-	}
-
-	/* NUMA-aware ordering of nodes */
-	local_node = pgdat->node_id;
-	load = num_online_nodes();
-	prev_node = local_node;
-	nodes_clear(used_mask);
-	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
-		int distance = node_distance(local_node, node);
-
-		/*
-		 * If another node is sufficiently far away then it is better
-		 * to reclaim pages in a zone before going off node.
-		 */
-		if (distance > RECLAIM_DISTANCE)
-			zone_reclaim_mode = 1;
-
-		/*
-		 * We don't want to pressure a particular node.
-		 * So adding penalty to the first node in same
-		 * distance group to make it round-robin.
-		 */
-
-		if (distance != node_distance(local_node, prev_node))
-			node_load[node] += load;
-		prev_node = node;
-		load--;
-		for (i = 0; i < GFP_ZONETYPES; i++) {
-			zonelist = pgdat->node_zonelists + i;
-			for (j = 0; zonelist->zones[j] != NULL; j++);
-
-			k = highest_zone(i);
-
-	 		j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
-			zonelist->zones[j] = NULL;
-		}
-	}
-}
-
-#else	/* CONFIG_NUMA */
-
-static void __init build_zonelists(pg_data_t *pgdat)
-{
-	int i, j, k, node, local_node;
-
-	local_node = pgdat->node_id;
-	for (i = 0; i < GFP_ZONETYPES; i++) {
-		struct zonelist *zonelist;
-
-		zonelist = pgdat->node_zonelists + i;
-
-		j = 0;
-		k = highest_zone(i);
- 		j = build_zonelists_node(pgdat, zonelist, j, k);
- 		/*
- 		 * Now we build the zonelist so that it contains the zones
- 		 * of all the other nodes.
- 		 * We don't want to pressure a particular node, so when
- 		 * building the zones for node N, we make sure that the
- 		 * zones coming right after the local ones are those from
- 		 * node N+1 (modulo N)
- 		 */
-		for (node = local_node + 1; node < MAX_NUMNODES; node++) {
-			if (!node_online(node))
-				continue;
-			j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
-		}
-		for (node = 0; node < local_node; node++) {
-			if (!node_online(node))
-				continue;
-			j = build_zonelists_node(NODE_DATA(node), zonelist, j, k);
-		}
-
-		zonelist->zones[j] = NULL;
-	}
-}
-
-#endif	/* CONFIG_NUMA */
-
-void __init build_all_zonelists(void)
-{
-	int i;
-
-	for_each_online_node(i)
-		build_zonelists(NODE_DATA(i));
-	printk("Built %i zonelists\n", num_online_nodes());
-	cpuset_init_current_mems_allowed();
-}
-
-/*
- * Helper functions to size the waitqueue hash table.
- * Essentially these want to choose hash table sizes sufficiently
- * large so that collisions trying to wait on pages are rare.
- * But in fact, the number of active page waitqueues on typical
- * systems is ridiculously low, less than 200. So this is even
- * conservative, even though it seems large.
- *
- * The constant PAGES_PER_WAITQUEUE specifies the ratio of pages to
- * waitqueues, i.e. the size of the waitq table given the number of pages.
- */
-#define PAGES_PER_WAITQUEUE	256
-
-static inline unsigned long wait_table_size(unsigned long pages)
-{
-	unsigned long size = 1;
-
-	pages /= PAGES_PER_WAITQUEUE;
-
-	while (size < pages)
-		size <<= 1;
-
-	/*
-	 * Once we have dozens or even hundreds of threads sleeping
-	 * on IO we've got bigger problems than wait queue collision.
-	 * Limit the size of the wait table to a reasonable size.
-	 */
-	size = min(size, 4096UL);
-
-	return max(size, 4UL);
-}
-
-/*
- * This is an integer logarithm so that shifts can be used later
- * to extract the more random high bits from the multiplicative
- * hash function before the remainder is taken.
- */
-static inline unsigned long wait_table_bits(unsigned long size)
-{
-	return ffz(~size);
-}
-
-#define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
-
-/*
- * Initially all pages are reserved - free ones are freed
- * up by free_all_bootmem() once the early boot process is
- * done. Non-atomic initialization, single-pass.
- */
-void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-		unsigned long start_pfn)
-{
-	struct page *page;
-	unsigned long end_pfn = start_pfn + size;
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-		if (!early_pfn_valid(pfn))
-			continue;
-		page = pfn_to_page(pfn);
-		set_page_links(page, zone, nid, pfn);
-		init_page_count(page);
-		reset_page_mapcount(page);
-		SetPageReserved(page);
-		INIT_LIST_HEAD(&page->lru);
-#ifdef WANT_PAGE_VIRTUAL
-		/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-		if (!is_highmem_idx(zone))
-			set_page_address(page, __va(pfn << PAGE_SHIFT));
-#endif
-	}
-}
-
-void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone,
-				unsigned long size)
-{
-	int order;
-	for (order = 0; order < MAX_ORDER ; order++) {
-		INIT_LIST_HEAD(&zone->free_area[order].free_list);
-		zone->free_area[order].nr_free = 0;
-	}
-}
-
-#define ZONETABLE_INDEX(x, zone_nr)	((x << ZONES_SHIFT) | zone_nr)
-void zonetable_add(struct zone *zone, int nid, int zid, unsigned long pfn,
-		unsigned long size)
-{
-	unsigned long snum = pfn_to_section_nr(pfn);
-	unsigned long end = pfn_to_section_nr(pfn + size);
-
-	if (FLAGS_HAS_NODE)
-		zone_table[ZONETABLE_INDEX(nid, zid)] = zone;
-	else
-		for (; snum <= end; snum++)
-			zone_table[ZONETABLE_INDEX(snum, zid)] = zone;
-}
-
-#ifndef __HAVE_ARCH_MEMMAP_INIT
-#define memmap_init(size, nid, zone, start_pfn) \
-	memmap_init_zone((size), (nid), (zone), (start_pfn))
-#endif
-
-static int __cpuinit zone_batchsize(struct zone *zone)
-{
-	int batch;
-
-	/*
-	 * The per-cpu-pages pools are set to around 1000th of the
-	 * size of the zone.  But no more than 1/2 of a meg.
-	 *
-	 * OK, so we don't know how big the cache is.  So guess.
-	 */
-	batch = zone->present_pages / 1024;
-	if (batch * PAGE_SIZE > 512 * 1024)
-		batch = (512 * 1024) / PAGE_SIZE;
-	batch /= 4;		/* We effectively *= 4 below */
-	if (batch < 1)
-		batch = 1;
-
-	/*
-	 * Clamp the batch to a 2^n - 1 value. Having a power
-	 * of 2 value was found to be more likely to have
-	 * suboptimal cache aliasing properties in some cases.
-	 *
-	 * For example if 2 tasks are alternately allocating
-	 * batches of pages, one task can end up with a lot
-	 * of pages of one half of the possible page colors
-	 * and the other with pages of the other colors.
-	 */
-	batch = (1 << (fls(batch + batch/2)-1)) - 1;
-
-	return batch;
-}
-
-inline void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
-{
-	struct per_cpu_pages *pcp;
-
-	memset(p, 0, sizeof(*p));
-
-	pcp = &p->pcp[0];		/* hot */
-	pcp->count = 0;
-	pcp->high = 6 * batch;
-	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
-
-	pcp = &p->pcp[1];		/* cold*/
-	pcp->count = 0;
-	pcp->high = 2 * batch;
-	pcp->batch = max(1UL, batch/2);
-	INIT_LIST_HEAD(&pcp->list);
-}
-
-/*
- * setup_pagelist_highmark() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-
-static void setup_pagelist_highmark(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	struct per_cpu_pages *pcp;
-
-	pcp = &p->pcp[0]; /* hot list */
-	pcp->high = high;
-	pcp->batch = max(1UL, high/4);
-	if ((high/4) > (PAGE_SHIFT * 8))
-		pcp->batch = PAGE_SHIFT * 8;
-}
-
-
-#ifdef CONFIG_NUMA
-/*
- * Boot pageset table. One per cpu which is going to be used for all
- * zones and all nodes. The parameters will be set in such a way
- * that an item put on a list will immediately be handed over to
- * the buddy list. This is safe since pageset manipulation is done
- * with interrupts disabled.
- *
- * Some NUMA counter updates may also be caught by the boot pagesets.
- *
- * The boot_pagesets must be kept even after bootup is complete for
- * unused processors and/or zones. They do play a role for bootstrapping
- * hotplugged processors.
- *
- * zoneinfo_show() and maybe other functions do
- * not check if the processor is online before following the pageset pointer.
- * Other parts of the kernel may not check if the zone is available.
- */
-static struct per_cpu_pageset boot_pageset[NR_CPUS];
-
-/*
- * Dynamically allocate memory for the
- * per cpu pageset array in struct zone.
- */
-static int __cpuinit process_zones(int cpu)
-{
-	struct zone *zone, *dzone;
-
-	for_each_zone(zone) {
-
-		zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
-					 GFP_KERNEL, cpu_to_node(cpu));
-		if (!zone_pcp(zone, cpu))
-			goto bad;
-
-		setup_pageset(zone_pcp(zone, cpu), zone_batchsize(zone));
-
-		if (percpu_pagelist_fraction)
-			setup_pagelist_highmark(zone_pcp(zone, cpu),
-			 	(zone->present_pages / percpu_pagelist_fraction));
-	}
-
-	return 0;
-bad:
-	for_each_zone(dzone) {
-		if (dzone == zone)
-			break;
-		kfree(zone_pcp(dzone, cpu));
-		zone_pcp(dzone, cpu) = NULL;
-	}
-	return -ENOMEM;
-}
-
-static inline void free_zone_pagesets(int cpu)
-{
-	struct zone *zone;
-
-	for_each_zone(zone) {
-		struct per_cpu_pageset *pset = zone_pcp(zone, cpu);
-
-		zone_pcp(zone, cpu) = NULL;
-		kfree(pset);
-	}
-}
-
-static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
-		unsigned long action,
-		void *hcpu)
-{
-	int cpu = (long)hcpu;
-	int ret = NOTIFY_OK;
-
-	switch (action) {
-		case CPU_UP_PREPARE:
-			if (process_zones(cpu))
-				ret = NOTIFY_BAD;
-			break;
-		case CPU_UP_CANCELED:
-		case CPU_DEAD:
-			free_zone_pagesets(cpu);
-			break;
-		default:
-			break;
-	}
-	return ret;
-}
-
-static struct notifier_block pageset_notifier =
-	{ &pageset_cpuup_callback, NULL, 0 };
-
-void __init setup_per_cpu_pageset(void)
-{
-	int err;
-
-	/* Initialize per_cpu_pageset for cpu 0.
-	 * A cpuup callback will do this for every cpu
-	 * as it comes online
-	 */
-	err = process_zones(smp_processor_id());
-	BUG_ON(err);
-	register_cpu_notifier(&pageset_notifier);
-}
-
-#endif
-
-static __meminit
-void zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
-{
-	int i;
-	struct pglist_data *pgdat = zone->zone_pgdat;
-
-	/*
-	 * The per-page waitqueue mechanism uses hashed waitqueues
-	 * per zone.
-	 */
-	zone->wait_table_size = wait_table_size(zone_size_pages);
-	zone->wait_table_bits =	wait_table_bits(zone->wait_table_size);
-	zone->wait_table = (wait_queue_head_t *)
-		alloc_bootmem_node(pgdat, zone->wait_table_size
-					* sizeof(wait_queue_head_t));
-
-	for(i = 0; i < zone->wait_table_size; ++i)
-		init_waitqueue_head(zone->wait_table + i);
-}
-
-static __meminit void zone_pcp_init(struct zone *zone)
-{
-	int cpu;
-	unsigned long batch = zone_batchsize(zone);
-
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_NUMA
-		/* Early boot. Slab allocator not functional yet */
-		zone_pcp(zone, cpu) = &boot_pageset[cpu];
-		setup_pageset(&boot_pageset[cpu],0);
-#else
-		setup_pageset(zone_pcp(zone,cpu), batch);
-#endif
-	}
-	if (zone->present_pages)
-		printk(KERN_DEBUG "  %s zone: %lu pages, LIFO batch:%lu\n",
-			zone->name, zone->present_pages, batch);
-}
-
-static __meminit void init_currently_empty_zone(struct zone *zone,
-		unsigned long zone_start_pfn, unsigned long size)
-{
-	struct pglist_data *pgdat = zone->zone_pgdat;
-
-	zone_wait_table_init(zone, size);
-	pgdat->nr_zones = zone_idx(zone) + 1;
-
-	zone->zone_start_pfn = zone_start_pfn;
-
-	memmap_init(size, pgdat->node_id, zone_idx(zone), zone_start_pfn);
-
-	zone_init_free_lists(pgdat, zone, zone->spanned_pages);
-}
-
-#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
-/* Note: nid == MAX_NUMNODES returns first region */
-static int __init first_active_region_index_in_nid(int nid)
-{
-	int i;
-	for (i = 0; early_node_map[i].end_pfn; i++) {
-		if (nid == MAX_NUMNODES || early_node_map[i].nid == nid)
-			return i;
-	}
-
-	return MAX_ACTIVE_REGIONS;
-}
-
-/* Note: nid == MAX_NUMNODES returns next region */
-static int __init next_active_region_index_in_nid(unsigned int index, int nid)
-{
-	for (index = index + 1; early_node_map[index].end_pfn; index++) {
-		if (nid == MAX_NUMNODES || early_node_map[index].nid == nid)
-			return index;
-	}
-
-	return MAX_ACTIVE_REGIONS;
-}
-
-#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
-int __init early_pfn_to_nid(unsigned long pfn)
-{
-	int i;
-
-	for (i = 0; early_node_map[i].end_pfn; i++) {
-		unsigned long start_pfn = early_node_map[i].start_pfn;
-		unsigned long end_pfn = early_node_map[i].end_pfn;
-
-		if ((start_pfn <= pfn) && (pfn < end_pfn))
-			return early_node_map[i].nid;
-	}
-
-	return -1;
-}
-#endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
-
-#define for_each_active_range_index_in_nid(i, nid) \
-	for (i = first_active_region_index_in_nid(nid); \
-				i != MAX_ACTIVE_REGIONS; \
-				i = next_active_region_index_in_nid(i, nid))
-
-void __init free_bootmem_with_active_regions(int nid,
-						unsigned long max_low_pfn)
-{
-	unsigned int i;
-	for_each_active_range_index_in_nid(i, nid) {
-		unsigned long size_pages = 0;
-		unsigned long end_pfn = early_node_map[i].end_pfn;
-		if (early_node_map[i].start_pfn >= max_low_pfn)
-			continue;
-
-		if (end_pfn > max_low_pfn)
-			end_pfn = max_low_pfn;
-
-		size_pages = end_pfn - early_node_map[i].start_pfn;
-		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
-				PFN_PHYS(early_node_map[i].start_pfn),
-				PFN_PHYS(size_pages));
-	}
-}
-
-void __init sparse_memory_present_with_active_regions(int nid)
-{
-	unsigned int i;
-	for_each_active_range_index_in_nid(i, nid)
-		memory_present(early_node_map[i].nid,
-				early_node_map[i].start_pfn,
-				early_node_map[i].end_pfn);
-}
-
-void __init get_pfn_range_for_nid(unsigned int nid,
-			unsigned long *start_pfn, unsigned long *end_pfn)
-{
-	unsigned int i;
-	*start_pfn = -1UL;
-	*end_pfn = 0;
-
-	for_each_active_range_index_in_nid(i, nid) {
-		if (early_node_map[i].start_pfn < *start_pfn)
-			*start_pfn = early_node_map[i].start_pfn;
-
-		if (early_node_map[i].end_pfn > *end_pfn)
-			*end_pfn = early_node_map[i].end_pfn;
-	}
-
-	if (*start_pfn == -1UL) {
-		printk(KERN_WARNING "Node %u active with no memory\n", nid);
-		*start_pfn = 0;
-	}
-}
-
-unsigned long __init zone_present_pages_in_node(int nid,
-					unsigned long zone_type,
-					unsigned long *ignored)
-{
-	unsigned long node_start_pfn, node_end_pfn;
-	unsigned long zone_start_pfn, zone_end_pfn;
-
-	/* Get the start and end of the node and zone */
-	get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
-	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
-	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
-
-	/* Check that this node has pages within the zone's required range */
-	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
-		return 0;
-
-	/* Move the zone boundaries inside the node if necessary */
-	if (zone_end_pfn > node_end_pfn)
-		zone_end_pfn = node_end_pfn;
-	if (zone_start_pfn < node_start_pfn)
-		zone_start_pfn = node_start_pfn;
-
-	/* Return the spanned pages */
-	return zone_end_pfn - zone_start_pfn;
-}
-
-unsigned long __init __absent_pages_in_range(int nid,
-				unsigned long range_start_pfn,
-				unsigned long range_end_pfn)
-{
-	int i = 0;
-	unsigned long prev_end_pfn = 0, hole_pages = 0;
-	unsigned long start_pfn;
-
-	/* Find the end_pfn of the first active range of pfns in the node */
-	i = first_active_region_index_in_nid(nid);
-	prev_end_pfn = early_node_map[i].start_pfn;
-
-	/* Find all holes for the zone within the node */
-	for (; i != MAX_ACTIVE_REGIONS;
-			i = next_active_region_index_in_nid(i, nid)) {
-
-		/* No need to continue if prev_end_pfn is outside the zone */
-		if (prev_end_pfn >= range_end_pfn)
-			break;
-
-		/* Make sure the end of the zone is not within the hole */
-		start_pfn = early_node_map[i].start_pfn;
-		if (start_pfn > range_end_pfn)
-			start_pfn = range_end_pfn;
-		if (prev_end_pfn < range_start_pfn)
-			prev_end_pfn = range_start_pfn;
-
-		/* Update the hole size cound and move on */
-		if (start_pfn > range_start_pfn) {
-			BUG_ON(prev_end_pfn > start_pfn);
-			hole_pages += start_pfn - prev_end_pfn;
-		}
-		prev_end_pfn = early_node_map[i].end_pfn;
-	}
-
-	return hole_pages;
-}
-
-unsigned long __init absent_pages_in_range(unsigned long start_pfn,
-							unsigned long end_pfn)
-{
-	return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn);
-}
-
-unsigned long __init zone_absent_pages_in_node(int nid,
-					unsigned long zone_type,
-					unsigned long *ignored)
-{
-	return __absent_pages_in_range(nid,
-				arch_zone_lowest_possible_pfn[zone_type],
-				arch_zone_highest_possible_pfn[zone_type]);
-}
-#else
-static inline unsigned long zone_present_pages_in_node(int nid,
-					unsigned long zone_type,
-					unsigned long *zones_size)
-{
-	return zones_size[zone_type];
-}
-
-static inline unsigned long zone_absent_pages_in_node(int nid,
-						unsigned long zone_type,
-						unsigned long *zholes_size)
-{
-	if (!zholes_size)
-		return 0;
-
-	return zholes_size[zone_type];
-}
-#endif
-
-static void __init calculate_node_totalpages(struct pglist_data *pgdat,
-		unsigned long *zones_size, unsigned long *zholes_size)
-{
-	unsigned long realtotalpages, totalpages = 0;
-	int i;
-
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		totalpages += zone_present_pages_in_node(pgdat->node_id, i,
-								zones_size);
-	}
-	pgdat->node_spanned_pages = totalpages;
-
-	realtotalpages = totalpages;
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		realtotalpages -=
-			zone_absent_pages_in_node(pgdat->node_id, i, zholes_size);
-	}
-	pgdat->node_present_pages = realtotalpages;
-	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id,
-							realtotalpages);
-}
-
-/*
- * Set up the zone data structures:
- *   - mark all pages reserved
- *   - mark all memory queues empty
- *   - clear the memory bitmaps
- */
-static void __init free_area_init_core(struct pglist_data *pgdat,
-		unsigned long *zones_size, unsigned long *zholes_size)
-{
-	unsigned long j;
-	int nid = pgdat->node_id;
-	unsigned long zone_start_pfn = pgdat->node_start_pfn;
-
-	pgdat_resize_init(pgdat);
-	pgdat->nr_zones = 0;
-	init_waitqueue_head(&pgdat->kswapd_wait);
-	pgdat->kswapd_max_order = 0;
-	
-	for (j = 0; j < MAX_NR_ZONES; j++) {
-		struct zone *zone = pgdat->node_zones + j;
-		unsigned long size, realsize;
-
-		size = zone_present_pages_in_node(nid, j, zones_size);
-		realsize = size - zone_absent_pages_in_node(nid, j,
-								zholes_size);
-		if (j < ZONE_HIGHMEM)
-			nr_kernel_pages += realsize;
-		nr_all_pages += realsize;
-
-		zone->spanned_pages = size;
-		zone->present_pages = realsize;
-		zone->name = zone_names[j];
-		spin_lock_init(&zone->lock);
-		spin_lock_init(&zone->lru_lock);
-		zone_seqlock_init(zone);
-		zone->zone_pgdat = pgdat;
-		zone->free_pages = 0;
-
-		zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
-
-		zone_pcp_init(zone);
-		INIT_LIST_HEAD(&zone->active_list);
-		INIT_LIST_HEAD(&zone->inactive_list);
-		zone->nr_scan_active = 0;
-		zone->nr_scan_inactive = 0;
-		zone->nr_active = 0;
-		zone->nr_inactive = 0;
-		atomic_set(&zone->reclaim_in_progress, 0);
-		if (!size)
-			continue;
-
-		zonetable_add(zone, nid, j, zone_start_pfn, size);
-		init_currently_empty_zone(zone, zone_start_pfn, size);
-		zone_start_pfn += size;
-	}
-}
-
-static void __init alloc_node_mem_map(struct pglist_data *pgdat)
-{
-	/* Skip empty nodes */
-	if (!pgdat->node_spanned_pages)
-		return;
-
-#ifdef CONFIG_FLAT_NODE_MEM_MAP
-	/* ia64 gets its own node_mem_map, before this, without bootmem */
-	if (!pgdat->node_mem_map) {
-		unsigned long size;
-		struct page *map;
-
-		size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
-		map = alloc_remap(pgdat->node_id, size);
-		if (!map)
-			map = alloc_bootmem_node(pgdat, size);
-		pgdat->node_mem_map = map;
-	}
-#ifdef CONFIG_FLATMEM
-	/*
-	 * With no DISCONTIG, the global mem_map is just set as node 0's
-	 */
-	if (pgdat == NODE_DATA(0))
-		mem_map = NODE_DATA(0)->node_mem_map;
-#endif
-#endif /* CONFIG_FLAT_NODE_MEM_MAP */
-}
-
-void __init free_area_init_node(int nid, struct pglist_data *pgdat,
-		unsigned long *zones_size, unsigned long node_start_pfn,
-		unsigned long *zholes_size)
-{
-	pgdat->node_id = nid;
-	pgdat->node_start_pfn = node_start_pfn;
-	calculate_node_totalpages(pgdat, zones_size, zholes_size);
-
-	alloc_node_mem_map(pgdat);
-
-	free_area_init_core(pgdat, zones_size, zholes_size);
-}
-
-#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
-void __init add_active_range(unsigned int nid, unsigned long start_pfn,
-						unsigned long end_pfn)
-{
-	unsigned int i;
-
-	/* Merge with existing active regions if possible */
-	for (i = 0; early_node_map[i].end_pfn; i++) {
-		if (early_node_map[i].nid != nid)
-			continue;
-
-		/* Skip if an existing region covers this new one */
-		if (start_pfn >= early_node_map[i].start_pfn &&
-				end_pfn <= early_node_map[i].end_pfn)
-			return;
-
-		/* Merge forward if suitable */
-		if (start_pfn <= early_node_map[i].end_pfn &&
-				end_pfn > early_node_map[i].end_pfn) {
-			early_node_map[i].end_pfn = end_pfn;
-			return;
-		}
-
-		/* Merge backward if suitable */
-		if (start_pfn < early_node_map[i].end_pfn &&
-				end_pfn >= early_node_map[i].start_pfn) {
-			early_node_map[i].start_pfn = start_pfn;
-			return;
-		}
-	}
-
-	/* Leave last entry NULL, we use range.end_pfn to terminate the walk */
-	if (i >= MAX_ACTIVE_REGIONS - 1) {
-		printk(KERN_ERR "Too many memory regions, truncating\n");
-		return;
-	}
-
-	early_node_map[i].nid = nid;
-	early_node_map[i].start_pfn = start_pfn;
-	early_node_map[i].end_pfn = end_pfn;
-}
-
-void __init remove_all_active_ranges()
-{
-	memset(early_node_map, 0, sizeof(early_node_map));
-}
-
-/* Compare two active node_active_regions */
-static int __init cmp_node_active_region(const void *a, const void *b)
-{
-	struct node_active_region *arange = (struct node_active_region *)a;
-	struct node_active_region *brange = (struct node_active_region *)b;
-
-	/* Done this way to avoid overflows */
-	if (arange->start_pfn > brange->start_pfn)
-		return 1;
-	if (arange->start_pfn < brange->start_pfn)
-		return -1;
-
-	return 0;
-}
-
-/* sort the node_map by start_pfn */
-static void __init sort_node_map(void)
-{
-	size_t num = 0;
-	while (early_node_map[num].end_pfn)
-		num++;
-
-	sort(early_node_map, num, sizeof(struct node_active_region),
-						cmp_node_active_region, NULL);
-}
-
-/* Find the lowest pfn for a node. This depends on a sorted early_node_map */
-unsigned long __init find_min_pfn_for_node(unsigned long nid)
-{
-	int i;
-
-	/* Assuming a sorted map, the first range found has the starting pfn */
-	for_each_active_range_index_in_nid(i, nid)
-		return early_node_map[i].start_pfn;
-
-	/* nid does not exist in early_node_map */
-	printk(KERN_WARNING "Could not find start_pfn for node %lu\n", nid);
-	return 0;
-}
-
-unsigned long __init find_min_pfn_with_active_regions(void)
-{
-	return find_min_pfn_for_node(MAX_NUMNODES);
-}
-
-unsigned long __init find_max_pfn_with_active_regions(void)
-{
-	int i;
-	unsigned long max_pfn = 0;
-
-	for (i = 0; early_node_map[i].end_pfn; i++)
-		max_pfn = max(max_pfn, early_node_map[i].end_pfn);
-
-	return max_pfn;
-}
-
-void __init free_area_init_nodes(unsigned long arch_max_dma_pfn,
-				unsigned long arch_max_dma32_pfn,
-				unsigned long arch_max_low_pfn,
-				unsigned long arch_max_high_pfn)
-{
-	unsigned long nid;
-	int zone_index;
-
-	/* Record where the zone boundaries are */
-	memset(arch_zone_lowest_possible_pfn, 0,
-				sizeof(arch_zone_lowest_possible_pfn));
-	memset(arch_zone_highest_possible_pfn, 0,
-				sizeof(arch_zone_highest_possible_pfn));
-	arch_zone_lowest_possible_pfn[ZONE_DMA] =
-					find_min_pfn_with_active_regions();
-	arch_zone_highest_possible_pfn[ZONE_DMA] = arch_max_dma_pfn;
-	arch_zone_highest_possible_pfn[ZONE_DMA32] = arch_max_dma32_pfn;
-	arch_zone_highest_possible_pfn[ZONE_NORMAL] = arch_max_low_pfn;
-	arch_zone_highest_possible_pfn[ZONE_HIGHMEM] = arch_max_high_pfn;
-	for (zone_index = 1; zone_index < MAX_NR_ZONES; zone_index++) {
-		arch_zone_lowest_possible_pfn[zone_index] =
-			arch_zone_highest_possible_pfn[zone_index-1];
-	}
-
-	/* Regions in the early_node_map can be in any order */
-	sort_node_map();
-
-	for_each_online_node(nid) {
-		pg_data_t *pgdat = NODE_DATA(nid);
-		free_area_init_node(nid, pgdat, NULL,
-				find_min_pfn_for_node(nid), NULL);
-	}
-}
-#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
-
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 static bootmem_data_t contig_bootmem_data;
 struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };
@@ -3020,32 +2021,6 @@ int lowmem_reserve_ratio_sysctl_handler(
 	return 0;
 }
 
-/*
- * percpu_pagelist_fraction - changes the pcp->high for each zone on each
- * cpu.  It is the fraction of total pages in each zone that a hot per cpu pagelist
- * can have before it gets flushed back to buddy allocator.
- */
-
-int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
-	struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
-{
-	struct zone *zone;
-	unsigned int cpu;
-	int ret;
-
-	ret = proc_dointvec_minmax(table, write, file, buffer, length, ppos);
-	if (!write || (ret == -EINVAL))
-		return ret;
-	for_each_zone(zone) {
-		for_each_online_cpu(cpu) {
-			unsigned long  high;
-			high = zone->present_pages / percpu_pagelist_fraction;
-			setup_pagelist_highmark(zone_pcp(zone, cpu), high);
-		}
-	}
-	return 0;
-}
-
 __initdata int hashdist = HASHDIST_DEFAULT;
 
 #ifdef CONFIG_NUMA

^ permalink raw reply

* [PATCH 5/7] Have ia64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-04-24 20:21 UTC (permalink / raw)
  To: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


Size zones and holes in an architecture independent manner for ia64.

This has only been compile-tested due to lack of a suitable test machine.


 arch/ia64/Kconfig          |    3 ++
 arch/ia64/mm/contig.c      |   60 +++++-----------------------------------
 arch/ia64/mm/discontig.c   |   41 ++++-----------------------
 arch/ia64/mm/init.c        |   12 ++++++++
 include/asm-ia64/meminit.h |    1 
 5 files changed, 30 insertions(+), 87 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/Kconfig linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/Kconfig
--- linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/Kconfig	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/Kconfig	2006-04-24 15:52:34.000000000 +0100
@@ -353,6 +353,9 @@ config NODES_SHIFT
 	  MAX_NUMNODES will be 2^(This value).
 	  If in doubt, use the default.
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 # VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent.
 # VIRTUAL_MEM_MAP has been retained for historical reasons.
 config VIRTUAL_MEM_MAP
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/contig.c
--- linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/contig.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/contig.c	2006-04-24 15:52:34.000000000 +0100
@@ -26,10 +26,6 @@
 #include <asm/sections.h>
 #include <asm/mca.h>
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static unsigned long num_dma_physpages;
-#endif
-
 /**
  * show_mem - display a memory statistics summary
  *
@@ -212,18 +208,6 @@ count_pages (u64 start, u64 end, void *a
 	return 0;
 }
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-static int
-count_dma_pages (u64 start, u64 end, void *arg)
-{
-	unsigned long *count = arg;
-
-	if (start < MAX_DMA_ADDRESS)
-		*count += (min(end, MAX_DMA_ADDRESS) - start) >> PAGE_SHIFT;
-	return 0;
-}
-#endif
-
 /*
  * Set up the page tables.
  */
@@ -232,47 +216,24 @@ void __init
 paging_init (void)
 {
 	unsigned long max_dma;
-	unsigned long zones_size[MAX_NR_ZONES];
 #ifdef CONFIG_VIRTUAL_MEM_MAP
-	unsigned long zholes_size[MAX_NR_ZONES];
+	unsigned long nid = 0;
 	unsigned long max_gap;
 #endif
 
-	/* initialize mem_map[] */
-
-	memset(zones_size, 0, sizeof(zones_size));
-
 	num_physpages = 0;
 	efi_memmap_walk(count_pages, &num_physpages);
 
 	max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
 
 #ifdef CONFIG_VIRTUAL_MEM_MAP
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	num_dma_physpages = 0;
-	efi_memmap_walk(count_dma_pages, &num_dma_physpages);
-
-	if (max_low_pfn < max_dma) {
-		zones_size[ZONE_DMA] = max_low_pfn;
-		zholes_size[ZONE_DMA] = max_low_pfn - num_dma_physpages;
-	} else {
-		zones_size[ZONE_DMA] = max_dma;
-		zholes_size[ZONE_DMA] = max_dma - num_dma_physpages;
-		if (num_physpages > num_dma_physpages) {
-			zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
-			zholes_size[ZONE_NORMAL] =
-				((max_low_pfn - max_dma) -
-				 (num_physpages - num_dma_physpages));
-		}
-	}
-
 	max_gap = 0;
+	efi_memmap_walk(register_active_ranges, &nid);
 	efi_memmap_walk(find_largest_hole, (u64 *)&max_gap);
 	if (max_gap < LARGE_GAP) {
 		vmem_map = (struct page *) 0;
-		free_area_init_node(0, NODE_DATA(0), zones_size, 0,
-				    zholes_size);
+		free_area_init_nodes(max_dma, max_dma,
+				max_low_pfn, max_low_pfn);
 	} else {
 		unsigned long map_size;
 
@@ -284,19 +245,14 @@ paging_init (void)
 		efi_memmap_walk(create_mem_map_page_table, NULL);
 
 		NODE_DATA(0)->node_mem_map = vmem_map;
-		free_area_init_node(0, NODE_DATA(0), zones_size,
-				    0, zholes_size);
+		free_area_init_nodes(max_dma, max_dma,
+				max_low_pfn, max_low_pfn);
 
 		printk("Virtual mem_map starts at 0x%p\n", mem_map);
 	}
 #else /* !CONFIG_VIRTUAL_MEM_MAP */
-	if (max_low_pfn < max_dma)
-		zones_size[ZONE_DMA] = max_low_pfn;
-	else {
-		zones_size[ZONE_DMA] = max_dma;
-		zones_size[ZONE_NORMAL] = max_low_pfn - max_dma;
-	}
-	free_area_init(zones_size);
+	add_active_range(0, 0, max_low_pfn);
+	free_area_init_nodes(max_dma, max_dma, max_low_pfn, max_low_pfn);
 #endif /* !CONFIG_VIRTUAL_MEM_MAP */
 	zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
 }
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c
--- linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/discontig.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/discontig.c	2006-04-24 15:52:34.000000000 +0100
@@ -700,6 +700,7 @@ static __init int count_node_pages(unsig
 {
 	unsigned long end = start + len;
 
+	add_active_range(node, start >> PAGE_SHIFT, end >> PAGE_SHIFT);
 	mem_data[node].num_physpages += len >> PAGE_SHIFT;
 	if (start <= __pa(MAX_DMA_ADDRESS))
 		mem_data[node].num_dma_physpages +=
@@ -724,9 +725,8 @@ static __init int count_node_pages(unsig
 void __init paging_init(void)
 {
 	unsigned long max_dma;
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
 	unsigned long pfn_offset = 0;
+	unsigned long max_pfn = 0;
 	int node;
 
 	max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT;
@@ -743,46 +743,17 @@ void __init paging_init(void)
 #endif
 
 	for_each_online_node(node) {
-		memset(zones_size, 0, sizeof(zones_size));
-		memset(zholes_size, 0, sizeof(zholes_size));
-
 		num_physpages += mem_data[node].num_physpages;
-
-		if (mem_data[node].min_pfn >= max_dma) {
-			/* All of this node's memory is above ZONE_DMA */
-			zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn -
-				mem_data[node].num_physpages;
-		} else if (mem_data[node].max_pfn < max_dma) {
-			/* All of this node's memory is in ZONE_DMA */
-			zones_size[ZONE_DMA] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_DMA] = mem_data[node].max_pfn -
-				mem_data[node].min_pfn -
-				mem_data[node].num_dma_physpages;
-		} else {
-			/* This node has memory in both zones */
-			zones_size[ZONE_DMA] = max_dma -
-				mem_data[node].min_pfn;
-			zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] -
-				mem_data[node].num_dma_physpages;
-			zones_size[ZONE_NORMAL] = mem_data[node].max_pfn -
-				max_dma;
-			zholes_size[ZONE_NORMAL] = zones_size[ZONE_NORMAL] -
-				(mem_data[node].num_physpages -
-				 mem_data[node].num_dma_physpages);
-		}
-
 		pfn_offset = mem_data[node].min_pfn;
 
 #ifdef CONFIG_VIRTUAL_MEM_MAP
 		NODE_DATA(node)->node_mem_map = vmem_map + pfn_offset;
 #endif
-		free_area_init_node(node, NODE_DATA(node), zones_size,
-				    pfn_offset, zholes_size);
+		if (mem_data[node].max_pfn > max_pfn)
+			max_pfn = mem_data[node].max_pfn;
 	}
 
+	free_area_init_nodes(max_dma, max_dma, max_pfn, max_pfn);
+
 	zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
 }
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/init.c linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/init.c
--- linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/ia64/mm/init.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-105-ia64_use_init_nodes/arch/ia64/mm/init.c	2006-04-24 15:52:34.000000000 +0100
@@ -539,6 +539,18 @@ find_largest_hole (u64 start, u64 end, v
 	last_end = end;
 	return 0;
 }
+
+int __init
+register_active_ranges(u64 start, u64 end, void *nid)
+{
+	BUG_ON(nid == NULL);
+	BUG_ON(*(unsigned long *)nid >= MAX_NUMNODES);
+
+	add_active_range(*(unsigned long *)nid,
+				__pa(start) >> PAGE_SHIFT,
+				__pa(end) >> PAGE_SHIFT);
+	return 0;
+}
 #endif /* CONFIG_VIRTUAL_MEM_MAP */
 
 static int __init
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h linux-2.6.17-rc2-105-ia64_use_init_nodes/include/asm-ia64/meminit.h
--- linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-ia64/meminit.h	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-105-ia64_use_init_nodes/include/asm-ia64/meminit.h	2006-04-24 15:52:34.000000000 +0100
@@ -56,6 +56,7 @@ extern void efi_memmap_init(unsigned lon
   extern unsigned long vmalloc_end;
   extern struct page *vmem_map;
   extern int find_largest_hole (u64 start, u64 end, void *arg);
+  extern int register_active_ranges (u64 start, u64 end, void *arg);
   extern int create_mem_map_page_table (u64 start, u64 end, void *arg);
 #endif
 

^ permalink raw reply

* [PATCH 4/7] Have x86_64 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-04-24 20:21 UTC (permalink / raw)
  To: davej, tony.luck, linuxppc-dev, linux-kernel, bob.picco, ak,
	linux-mm
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


Size zones and holes in an architecture independent manner for x86_64.

This has only been boot tested on an x86_64 with NUMA and SRAT.


 arch/x86_64/Kconfig         |    3 +
 arch/x86_64/kernel/e820.c   |  109 ++++++++++-----------------------------
 arch/x86_64/kernel/setup.c  |    7 ++
 arch/x86_64/mm/init.c       |   62 +---------------------
 arch/x86_64/mm/k8topology.c |    3 +
 arch/x86_64/mm/numa.c       |   18 +++---
 arch/x86_64/mm/srat.c       |   11 ++-
 include/asm-x86_64/e820.h   |    5 -
 include/asm-x86_64/proto.h  |    2 
 9 files changed, 63 insertions(+), 157 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/Kconfig linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/Kconfig
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/Kconfig	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/Kconfig	2006-04-24 15:51:46.000000000 +0100
@@ -73,6 +73,9 @@ config ARCH_MAY_HAVE_PC_FDC
 	bool
 	default y
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 config DMI
 	bool
 	default y
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/kernel/e820.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/kernel/e820.c	2006-04-24 15:51:46.000000000 +0100
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/kexec.h>
 #include <linux/module.h>
+#include <linux/mm.h>
 
 #include <asm/page.h>
 #include <asm/e820.h>
@@ -155,58 +156,14 @@ unsigned long __init find_e820_area(unsi
 	return -1UL;		
 } 
 
-/* 
- * Free bootmem based on the e820 table for a node.
- */
-void __init e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end)
-{
-	int i;
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i]; 
-		unsigned long last, addr;
-
-		if (ei->type != E820_RAM || 
-		    ei->addr+ei->size <= start || 
-		    ei->addr >= end)
-			continue;
-
-		addr = round_up(ei->addr, PAGE_SIZE);
-		if (addr < start) 
-			addr = start;
-
-		last = round_down(ei->addr + ei->size, PAGE_SIZE); 
-		if (last >= end)
-			last = end; 
-
-		if (last > addr && last-addr >= PAGE_SIZE)
-			free_bootmem_node(pgdat, addr, last-addr);
-	}
-}
-
 /*
  * Find the highest page frame number we have available
  */
 unsigned long __init e820_end_of_ram(void)
 {
-	int i;
 	unsigned long end_pfn = 0;
 	
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i]; 
-		unsigned long start, end;
-
-		start = round_up(ei->addr, PAGE_SIZE); 
-		end = round_down(ei->addr + ei->size, PAGE_SIZE); 
-		if (start >= end)
-			continue;
-		if (ei->type == E820_RAM) { 
-		if (end > end_pfn<<PAGE_SHIFT)
-			end_pfn = end>>PAGE_SHIFT;
-		} else { 
-			if (end > end_pfn_map<<PAGE_SHIFT) 
-				end_pfn_map = end>>PAGE_SHIFT;
-		} 
-	}
+	end_pfn = find_max_pfn_with_active_regions();
 
 	if (end_pfn > end_pfn_map) 
 		end_pfn_map = end_pfn;
@@ -220,40 +177,6 @@ unsigned long __init e820_end_of_ram(voi
 	return end_pfn;	
 }
 
-/* 
- * Compute how much memory is missing in a range.
- * Unlike the other functions in this file the arguments are in page numbers.
- */
-unsigned long __init
-e820_hole_size(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long ram = 0;
-	unsigned long start = start_pfn << PAGE_SHIFT;
-	unsigned long end = end_pfn << PAGE_SHIFT;
-	int i;
-	for (i = 0; i < e820.nr_map; i++) {
-		struct e820entry *ei = &e820.map[i];
-		unsigned long last, addr;
-
-		if (ei->type != E820_RAM ||
-		    ei->addr+ei->size <= start ||
-		    ei->addr >= end)
-			continue;
-
-		addr = round_up(ei->addr, PAGE_SIZE);
-		if (addr < start)
-			addr = start;
-
-		last = round_down(ei->addr + ei->size, PAGE_SIZE);
-		if (last >= end)
-			last = end;
-
-		if (last > addr)
-			ram += last - addr;
-	}
-	return ((end - start) - ram) >> PAGE_SHIFT;
-}
-
 /*
  * Mark e820 reserved areas as busy for the resource manager.
  */
@@ -288,6 +211,34 @@ void __init e820_reserve_resources(void)
 	}
 }
 
+/* Walk the e820 map and register active regions within a node */
+void __init
+e820_register_active_regions(int nid, unsigned long start_pfn,
+							unsigned long end_pfn)
+{
+	int i;
+	unsigned long ei_startpfn, ei_endpfn;
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		ei_startpfn = round_up(ei->addr, PAGE_SIZE) >> PAGE_SHIFT;
+		ei_endpfn = round_down(ei->addr + ei->size, PAGE_SIZE)
+								>> PAGE_SHIFT;
+		/* Skip if map is outside the node */
+		if (ei->type != E820_RAM ||
+				ei_endpfn <= start_pfn ||
+				ei_startpfn >= end_pfn)
+			continue;
+
+		/* Check for overlaps */
+		if (ei_startpfn < start_pfn)
+			ei_startpfn = start_pfn;
+		if (ei_endpfn > end_pfn)
+			ei_endpfn = end_pfn;
+
+		add_active_range(nid, ei_startpfn, ei_endpfn);
+	}
+}
+
 /* 
  * Add a memory region to the kernel e820 map.
  */ 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/kernel/setup.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/kernel/setup.c	2006-04-24 16:13:15.000000000 +0100
@@ -468,7 +468,8 @@ contig_initmem_init(unsigned long start_
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n",bootmap_size);
 	bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);
-	e820_bootmem_free(NODE_DATA(0), 0, end_pfn << PAGE_SHIFT);
+	e820_register_active_regions(0, start_pfn, end_pfn);
+	free_bootmem_with_active_regions(0, end_pfn);
 	reserve_bootmem(bootmap, bootmap_size);
 } 
 #endif
@@ -618,6 +619,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_identify_cpu(&boot_cpu_data);
 
+	e820_register_active_regions(0, 0, -1UL);
 	/*
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
@@ -641,6 +643,9 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_table_init();
 #endif
 
+	/* Remove active ranges so rediscovery with NUMA-awareness happens */
+	remove_all_active_ranges();
+
 #ifdef CONFIG_ACPI_NUMA
 	/*
 	 * Parse SRAT to discover nodes.
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/init.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/init.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/init.c	2006-04-24 15:51:46.000000000 +0100
@@ -405,69 +405,12 @@ void __cpuinit zap_low_mappings(int cpu)
 	__flush_tlb_all();
 }
 
-/* Compute zone sizes for the DMA and DMA32 zones in a node. */
-__init void
-size_zones(unsigned long *z, unsigned long *h,
-	   unsigned long start_pfn, unsigned long end_pfn)
-{
- 	int i;
- 	unsigned long w;
-
- 	for (i = 0; i < MAX_NR_ZONES; i++)
- 		z[i] = 0;
-
- 	if (start_pfn < MAX_DMA_PFN)
- 		z[ZONE_DMA] = MAX_DMA_PFN - start_pfn;
- 	if (start_pfn < MAX_DMA32_PFN) {
- 		unsigned long dma32_pfn = MAX_DMA32_PFN;
- 		if (dma32_pfn > end_pfn)
- 			dma32_pfn = end_pfn;
- 		z[ZONE_DMA32] = dma32_pfn - start_pfn;
- 	}
- 	z[ZONE_NORMAL] = end_pfn - start_pfn;
-
- 	/* Remove lower zones from higher ones. */
- 	w = 0;
- 	for (i = 0; i < MAX_NR_ZONES; i++) {
- 		if (z[i])
- 			z[i] -= w;
- 	        w += z[i];
-	}
-
-	/* Compute holes */
-	w = start_pfn;
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		unsigned long s = w;
-		w += z[i];
-		h[i] = e820_hole_size(s, w);
-	}
-
-	/* Add the space pace needed for mem_map to the holes too. */
-	for (i = 0; i < MAX_NR_ZONES; i++)
-		h[i] += (z[i] * sizeof(struct page)) / PAGE_SIZE;
-
-	/* The 16MB DMA zone has the kernel and other misc mappings.
- 	   Account them too */
-	if (h[ZONE_DMA]) {
-		h[ZONE_DMA] += dma_reserve;
-		if (h[ZONE_DMA] >= z[ZONE_DMA]) {
-			printk(KERN_WARNING
-				"Kernel too large and filling up ZONE_DMA?\n");
-			h[ZONE_DMA] = z[ZONE_DMA];
-		}
-	}
-}
-
 #ifndef CONFIG_NUMA
 void __init paging_init(void)
 {
-	unsigned long zones[MAX_NR_ZONES], holes[MAX_NR_ZONES];
-
 	memory_present(0, 0, end_pfn);
 	sparse_init();
-	size_zones(zones, holes, 0, end_pfn);
-	free_area_init_node(0, NODE_DATA(0), zones,
-			    __pa(PAGE_OFFSET) >> PAGE_SHIFT, holes);
+	free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
 }
 #endif
 
@@ -609,7 +552,8 @@ void __init mem_init(void)
 #else
 	totalram_pages = free_all_bootmem();
 #endif
-	reservedpages = end_pfn - totalram_pages - e820_hole_size(0, end_pfn);
+	reservedpages = end_pfn - totalram_pages -
+					absent_pages_in_range(0, end_pfn);
 
 	after_bootmem = 1;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/k8topology.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/k8topology.c	2006-04-24 15:51:46.000000000 +0100
@@ -146,6 +146,9 @@ int __init k8_scan_nodes(unsigned long s
 		
 		nodes[nodeid].start = base; 
 		nodes[nodeid].end = limit;
+		e820_register_active_regions(nodeid,
+				nodes[nodeid].start >> PAGE_SHIFT,
+				nodes[nodeid].end >> PAGE_SHIFT);
 
 		prevbase = base;
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/numa.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/numa.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/numa.c	2006-04-24 15:51:46.000000000 +0100
@@ -161,7 +161,7 @@ void __init setup_node_bootmem(int nodei
 					 bootmap_start >> PAGE_SHIFT, 
 					 start_pfn, end_pfn); 
 
-	e820_bootmem_free(NODE_DATA(nodeid), start, end);
+	free_bootmem_with_active_regions(nodeid, end);
 
 	reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys, pgdat_size); 
 	reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start, bootmap_pages<<PAGE_SHIFT);
@@ -175,13 +175,11 @@ void __init setup_node_bootmem(int nodei
 void __init setup_node_zones(int nodeid)
 { 
 	unsigned long start_pfn, end_pfn, memmapsize, limit;
-	unsigned long zones[MAX_NR_ZONES];
-	unsigned long holes[MAX_NR_ZONES];
 
  	start_pfn = node_start_pfn(nodeid);
  	end_pfn = node_end_pfn(nodeid);
 
-	Dprintk(KERN_INFO "Setting up node %d %lx-%lx\n",
+	Dprintk(KERN_INFO "Setting up memmap for node %d %lx-%lx\n",
 		nodeid, start_pfn, end_pfn);
 
 	/* Try to allocate mem_map at end to not fill up precious <4GB
@@ -193,10 +191,6 @@ void __init setup_node_zones(int nodeid)
 				memmapsize, SMP_CACHE_BYTES, 
 				round_down(limit - memmapsize, PAGE_SIZE), 
 				limit);
-
-	size_zones(zones, holes, start_pfn, end_pfn);
-	free_area_init_node(nodeid, NODE_DATA(nodeid), zones,
-			    start_pfn, holes);
 } 
 
 void __init numa_init_array(void)
@@ -257,8 +251,11 @@ static int numa_emulation(unsigned long 
  		printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
  		return -1;
  	}
- 	for_each_online_node(i)
+ 	for_each_online_node(i) {
+		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+						nodes[i].end >> PAGE_SHIFT);
  		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+	}
  	numa_init_array();
  	return 0;
 }
@@ -297,6 +294,7 @@ void __init numa_initmem_init(unsigned l
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
 	node_to_cpumask[0] = cpumask_of_cpu(0);
+	e820_register_active_regions(0, start_pfn, end_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
 }
 
@@ -344,6 +342,8 @@ void __init paging_init(void)
 	for_each_online_node(i) {
 		setup_node_zones(i); 
 	}
+
+	free_area_init_nodes(MAX_DMA_PFN, MAX_DMA32_PFN, end_pfn, end_pfn);
 } 
 
 /* [numa=off] */
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
--- linux-2.6.17-rc2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c	2006-04-24 16:40:27.000000000 +0100
@@ -107,6 +107,7 @@ static __init void bad_srat(void)
 		apicid_to_node[i] = NUMA_NO_NODE;
 	for (i = 0; i < MAX_NUMNODES; i++)
 		nodes_add[i].start = nodes[i].end = 0;
+	remove_all_active_ranges();
 }
 
 static __init inline int srat_disabled(void)
@@ -188,7 +189,7 @@ static int hotadd_enough_memory(struct b
 
 	if (mem < 0)
 		return 0;
-	allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
+	allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * PAGE_SIZE;
 	allowed = (allowed / 100) * hotadd_percent;
 	if (allocated + mem > allowed) {
 		/* Give them at least part of their hotadd memory upto hotadd_percent
@@ -236,7 +237,7 @@ static int reserve_hotadd(int node, unsi
 	}
 
 	/* This check might be a bit too strict, but I'm keeping it for now. */
-	if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
+	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
 		printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
 		return -1;
 	}
@@ -330,6 +331,8 @@ acpi_numa_memory_affinity_init(struct ac
 
 	printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
 	       nd->start, nd->end);
+	e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
+						nd->end >> PAGE_SHIFT);
 
 #ifdef RESERVE_HOTADD
  	if (ma->flags.hot_pluggable && reserve_hotadd(node, start, end) < 0) {
@@ -354,13 +357,13 @@ static int nodes_cover_memory(void)
 		unsigned long s = nodes[i].start >> PAGE_SHIFT;
 		unsigned long e = nodes[i].end >> PAGE_SHIFT;
 		pxmram += e - s;
-		pxmram -= e820_hole_size(s, e);
+		pxmram -= absent_pages_in_range(s, e);
 		pxmram -= nodes_add[i].end - nodes_add[i].start;
 		if ((long)pxmram < 0)
 			pxmram = 0;
 	}
 
-	e820ram = end_pfn - e820_hole_size(0, end_pfn);
+	e820ram = end_pfn - absent_pages_in_range(0, end_pfn);
 	/* We seem to lose 3 pages somewhere. Allow a bit of slack. */
 	if ((long)(e820ram - pxmram) >= 1*1024*1024) {
 		printk(KERN_ERR
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/include/asm-x86_64/e820.h linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h
--- linux-2.6.17-rc2-103-x86_use_init_nodes/include/asm-x86_64/e820.h	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-x86_64/e820.h	2006-04-24 15:51:46.000000000 +0100
@@ -50,10 +50,9 @@ extern void e820_print_map(char *who);
 extern int e820_any_mapped(unsigned long start, unsigned long end, unsigned type);
 extern int e820_all_mapped(unsigned long start, unsigned long end, unsigned type);
 
-extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end);
 extern void e820_setup_gap(void);
-extern unsigned long e820_hole_size(unsigned long start_pfn,
-				    unsigned long end_pfn);
+extern void e820_register_active_regions(int nid,
+				unsigned long start_pfn, unsigned long end_pfn);
 
 extern void __init parse_memopt(char *p, char **end);
 extern void __init parse_memmapopt(char *p, char **end);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-103-x86_use_init_nodes/include/asm-x86_64/proto.h linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h
--- linux-2.6.17-rc2-103-x86_use_init_nodes/include/asm-x86_64/proto.h	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-104-x86_64_use_init_nodes/include/asm-x86_64/proto.h	2006-04-24 15:51:46.000000000 +0100
@@ -24,8 +24,6 @@ extern void mtrr_bp_init(void);
 #define mtrr_bp_init() do {} while (0)
 #endif
 extern void init_memory_mapping(unsigned long start, unsigned long end);
-extern void size_zones(unsigned long *z, unsigned long *h,
-			unsigned long start_pfn, unsigned long end_pfn);
 
 extern void system_call(void); 
 extern int kernel_syscall(void);

^ permalink raw reply

* [PATCH 3/7] Have x86 use add_active_range() and free_area_init_nodes
From: Mel Gorman @ 2006-04-24 20:21 UTC (permalink / raw)
  To: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


Size zones and holes in an architecture independent manner for x86.

This has been boot tested on;

x86 with 4 CPUs, flatmem
x86 on NUMAQ

It needs to be boot tested on an x86 machine that uses SRAT.


 Kconfig        |    8 +---
 kernel/setup.c |   19 +++-------
 kernel/srat.c  |   98 ----------------------------------------------------
 mm/discontig.c |   59 ++++---------------------------
 4 files changed, 17 insertions(+), 167 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/Kconfig linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/Kconfig
--- linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/Kconfig	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/Kconfig	2006-04-24 15:50:57.000000000 +0100
@@ -569,12 +569,10 @@ config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
 	depends on ARCH_SPARSEMEM_ENABLE
 
-source "mm/Kconfig"
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
 
-config HAVE_ARCH_EARLY_PFN_TO_NID
-	bool
-	default y
-	depends on NUMA
+source "mm/Kconfig"
 
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/kernel/setup.c
--- linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/kernel/setup.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/kernel/setup.c	2006-04-24 15:50:57.000000000 +0100
@@ -1186,22 +1186,15 @@ static unsigned long __init setup_memory
 
 void __init zone_sizes_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
-	unsigned int max_dma, low;
+	unsigned int max_dma;
+#ifndef CONFIG_HIGHMEM
+	unsigned long highend_pfn = max_low_pfn;
+#endif
 
 	max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-	low = max_low_pfn;
 
-	if (low < max_dma)
-		zones_size[ZONE_DMA] = low;
-	else {
-		zones_size[ZONE_DMA] = max_dma;
-		zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
-		zones_size[ZONE_HIGHMEM] = highend_pfn - low;
-#endif
-	}
-	free_area_init(zones_size);
+	add_active_range(0, 0, highend_pfn);
+	free_area_init_nodes(max_dma, max_dma, max_low_pfn, highend_pfn);
 }
 #else
 extern unsigned long __init setup_memory(void);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/kernel/srat.c
--- linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/kernel/srat.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/kernel/srat.c	2006-04-24 15:50:57.000000000 +0100
@@ -56,8 +56,6 @@ struct node_memory_chunk_s {
 static struct node_memory_chunk_s node_memory_chunk[MAXCHUNKS];
 
 static int num_memory_chunks;		/* total number of memory chunks */
-static int zholes_size_init;
-static unsigned long zholes_size[MAX_NUMNODES * MAX_NR_ZONES];
 
 extern void * boot_ioremap(unsigned long, unsigned long);
 
@@ -137,50 +135,6 @@ static void __init parse_memory_affinity
 		 "enabled and removable" : "enabled" ) );
 }
 
-#if MAX_NR_ZONES != 4
-#error "MAX_NR_ZONES != 4, chunk_to_zone requires review"
-#endif
-/* Take a chunk of pages from page frame cstart to cend and count the number
- * of pages in each zone, returned via zones[].
- */
-static __init void chunk_to_zones(unsigned long cstart, unsigned long cend, 
-		unsigned long *zones)
-{
-	unsigned long max_dma;
-	extern unsigned long max_low_pfn;
-
-	int z;
-	unsigned long rend;
-
-	/* FIXME: MAX_DMA_ADDRESS and max_low_pfn are trying to provide
-	 * similarly scoped information and should be handled in a consistant
-	 * manner.
-	 */
-	max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
-	/* Split the hole into the zones in which it falls.  Repeatedly
-	 * take the segment in which the remaining hole starts, round it
-	 * to the end of that zone.
-	 */
-	memset(zones, 0, MAX_NR_ZONES * sizeof(long));
-	while (cstart < cend) {
-		if (cstart < max_dma) {
-			z = ZONE_DMA;
-			rend = (cend < max_dma)? cend : max_dma;
-
-		} else if (cstart < max_low_pfn) {
-			z = ZONE_NORMAL;
-			rend = (cend < max_low_pfn)? cend : max_low_pfn;
-
-		} else {
-			z = ZONE_HIGHMEM;
-			rend = cend;
-		}
-		zones[z] += rend - cstart;
-		cstart = rend;
-	}
-}
-
 /*
  * The SRAT table always lists ascending addresses, so can always
  * assume that the first "start" address that you see is the real
@@ -233,7 +187,6 @@ static int __init acpi20_parse_srat(stru
 
 	memset(pxm_bitmap, 0, sizeof(pxm_bitmap));	/* init proximity domain bitmap */
 	memset(node_memory_chunk, 0, sizeof(node_memory_chunk));
-	memset(zholes_size, 0, sizeof(zholes_size));
 
 	/* -1 in these maps means not available */
 	memset(pxm_to_nid_map, -1, sizeof(pxm_to_nid_map));
@@ -414,54 +367,3 @@ out_err:
 	printk("failed to get NUMA memory information from SRAT table\n");
 	return 0;
 }
-
-/* For each node run the memory list to determine whether there are
- * any memory holes.  For each hole determine which ZONE they fall
- * into.
- *
- * NOTE#1: this requires knowledge of the zone boundries and so
- * _cannot_ be performed before those are calculated in setup_memory.
- * 
- * NOTE#2: we rely on the fact that the memory chunks are ordered by
- * start pfn number during setup.
- */
-static void __init get_zholes_init(void)
-{
-	int nid;
-	int c;
-	int first;
-	unsigned long end = 0;
-
-	for_each_online_node(nid) {
-		first = 1;
-		for (c = 0; c < num_memory_chunks; c++){
-			if (node_memory_chunk[c].nid == nid) {
-				if (first) {
-					end = node_memory_chunk[c].end_pfn;
-					first = 0;
-
-				} else {
-					/* Record any gap between this chunk
-					 * and the previous chunk on this node
-					 * against the zones it spans.
-					 */
-					chunk_to_zones(end,
-						node_memory_chunk[c].start_pfn,
-						&zholes_size[nid * MAX_NR_ZONES]);
-				}
-			}
-		}
-	}
-}
-
-unsigned long * __init get_zholes_size(int nid)
-{
-	if (!zholes_size_init) {
-		zholes_size_init++;
-		get_zholes_init();
-	}
-	if (nid >= MAX_NUMNODES || !node_online(nid))
-		printk("%s: nid = %d is invalid/offline. num_online_nodes = %d",
-		       __FUNCTION__, nid, num_online_nodes());
-	return &zholes_size[nid * MAX_NR_ZONES];
-}
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/mm/discontig.c
--- linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/i386/mm/discontig.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-103-x86_use_init_nodes/arch/i386/mm/discontig.c	2006-04-24 15:50:57.000000000 +0100
@@ -157,21 +157,6 @@ static void __init find_max_pfn_node(int
 		BUG();
 }
 
-/* Find the owning node for a pfn. */
-int early_pfn_to_nid(unsigned long pfn)
-{
-	int nid;
-
-	for_each_node(nid) {
-		if (node_end_pfn[nid] == 0)
-			break;
-		if (node_start_pfn[nid] <= pfn && node_end_pfn[nid] >= pfn)
-			return nid;
-	}
-
-	return 0;
-}
-
 /* 
  * Allocate memory for the pg_data_t for this node via a crude pre-bootmem
  * method.  For node zero take this from the bottom of memory, for
@@ -352,45 +337,17 @@ unsigned long __init setup_memory(void)
 void __init zone_sizes_init(void)
 {
 	int nid;
-
+	unsigned long max_dma_pfn;
 
 	for_each_online_node(nid) {
-		unsigned long zones_size[MAX_NR_ZONES] = {0, 0, 0};
-		unsigned long *zholes_size;
-		unsigned int max_dma;
-
-		unsigned long low = max_low_pfn;
-		unsigned long start = node_start_pfn[nid];
-		unsigned long high = node_end_pfn[nid];
-
-		max_dma = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
-
-		if (node_has_online_mem(nid)){
-			if (start > low) {
-#ifdef CONFIG_HIGHMEM
-				BUG_ON(start > high);
-				zones_size[ZONE_HIGHMEM] = high - start;
-#endif
-			} else {
-				if (low < max_dma)
-					zones_size[ZONE_DMA] = low;
-				else {
-					BUG_ON(max_dma > low);
-					BUG_ON(low > high);
-					zones_size[ZONE_DMA] = max_dma;
-					zones_size[ZONE_NORMAL] = low - max_dma;
-#ifdef CONFIG_HIGHMEM
-					zones_size[ZONE_HIGHMEM] = high - low;
-#endif
-				}
-			}
-		}
-
-		zholes_size = get_zholes_size(nid);
-
-		free_area_init_node(nid, NODE_DATA(nid), zones_size, start,
-				zholes_size);
+		if (node_has_online_mem(nid))
+			add_active_range(nid, node_start_pfn[nid],
+							node_end_pfn[nid]);
 	}
+
+	max_dma_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
+	free_area_init_nodes(max_dma_pfn, max_dma_pfn,
+						max_low_pfn, highend_pfn);
 	return;
 }
 

^ permalink raw reply

* [PATCH 2/7] Have Power use add_active_range() and free_area_init_nodes()
From: Mel Gorman @ 2006-04-24 20:20 UTC (permalink / raw)
  To: davej, tony.luck, linuxppc-dev, linux-kernel, bob.picco, ak,
	linux-mm
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


Size zones and holes in an architecture independent manner for Power.

This has been boot tested on PPC64 with NUMA both enabled and disabled. It
has been compile tested for an older CHRP-based machine.


 powerpc/Kconfig   |   13 ++--
 powerpc/mm/mem.c  |   53 ++++++----------
 powerpc/mm/numa.c |  157 ++++---------------------------------------------
 ppc/Kconfig       |    3 
 ppc/mm/init.c     |   26 ++++----
 5 files changed, 62 insertions(+), 190 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/Kconfig linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/Kconfig
--- linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/Kconfig	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/Kconfig	2006-04-24 15:50:11.000000000 +0100
@@ -676,11 +676,16 @@ config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
 	depends on SMP && PPC_PSERIES
 
-source "mm/Kconfig"
-
-config HAVE_ARCH_EARLY_PFN_TO_NID
+config ARCH_POPULATES_NODE_MAP
 	def_bool y
-	depends on NEED_MULTIPLE_NODES
+
+# Value of 256 is MAX_LMB_REGIONS * 2
+config MAX_ACTIVE_REGIONS
+	int
+	default 256
+	depends on ARCH_POPULATES_NODE_MAP
+
+source "mm/Kconfig"
 
 config ARCH_MEMORY_PROBE
 	def_bool y
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c
--- linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/mm/mem.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/mm/mem.c	2006-04-24 15:50:11.000000000 +0100
@@ -252,20 +252,22 @@ void __init do_init_bootmem(void)
 
 	boot_mapsize = init_bootmem(start >> PAGE_SHIFT, total_pages);
 
+	/* Add active regions with valid PFNs */
+	for (i = 0; i < lmb.memory.cnt; i++) {
+		unsigned long start_pfn, end_pfn;
+		start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+		end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+		add_active_range(0, start_pfn, end_pfn);
+	}
+
 	/* Add all physical memory to the bootmem map, mark each area
 	 * present.
 	 */
-	for (i = 0; i < lmb.memory.cnt; i++) {
-		unsigned long base = lmb.memory.region[i].base;
-		unsigned long size = lmb_size_bytes(&lmb.memory, i);
 #ifdef CONFIG_HIGHMEM
-		if (base >= total_lowmem)
-			continue;
-		if (base + size > total_lowmem)
-			size = total_lowmem - base;
+	free_bootmem_with_active_regions(0, total_lowmem >> PAGE_SHIFT);
+#else
+	free_bootmem_with_active_regions(0, max_pfn);
 #endif
-		free_bootmem(base, size);
-	}
 
 	/* reserve the sections we're already using */
 	for (i = 0; i < lmb.reserved.cnt; i++)
@@ -273,9 +275,8 @@ void __init do_init_bootmem(void)
 				lmb_size_bytes(&lmb.reserved, i));
 
 	/* XXX need to clip this if using highmem? */
-	for (i = 0; i < lmb.memory.cnt; i++)
-		memory_present(0, lmb_start_pfn(&lmb.memory, i),
-			       lmb_end_pfn(&lmb.memory, i));
+	sparse_memory_present_with_active_regions(0);
+
 	init_bootmem_done = 1;
 }
 
@@ -284,8 +285,6 @@ void __init do_init_bootmem(void)
  */
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
 	unsigned long total_ram = lmb_phys_mem_size();
 	unsigned long top_of_ram = lmb_end_of_DRAM();
 
@@ -303,26 +302,18 @@ void __init paging_init(void)
 	       top_of_ram, total_ram);
 	printk(KERN_INFO "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
-	/*
-	 * All pages are DMA-able so we put them all in the DMA zone.
-	 */
-	memset(zones_size, 0, sizeof(zones_size));
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
-	zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-
 #ifdef CONFIG_HIGHMEM
-	zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
-	zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
-	zholes_size[ZONE_HIGHMEM] = (top_of_ram - total_ram) >> PAGE_SHIFT;
+	free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT);
 #else
-	zones_size[ZONE_DMA] = top_of_ram >> PAGE_SHIFT;
-	zholes_size[ZONE_DMA] = (top_of_ram - total_ram) >> PAGE_SHIFT;
-#endif /* CONFIG_HIGHMEM */
+	free_area_init_nodes(top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT,
+				top_of_ram >> PAGE_SHIFT);
+#endif
 
-	free_area_init_node(0, NODE_DATA(0), zones_size,
-			    __pa(PAGE_OFFSET) >> PAGE_SHIFT, zholes_size);
 }
 #endif /* ! CONFIG_NEED_MULTIPLE_NODES */
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c
--- linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/powerpc/mm/numa.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/powerpc/mm/numa.c	2006-04-24 15:50:11.000000000 +0100
@@ -39,96 +39,6 @@ static bootmem_data_t __initdata plat_no
 static int min_common_depth;
 static int n_mem_addr_cells, n_mem_size_cells;
 
-/*
- * We need somewhere to store start/end/node for each region until we have
- * allocated the real node_data structures.
- */
-#define MAX_REGIONS	(MAX_LMB_REGIONS*2)
-static struct {
-	unsigned long start_pfn;
-	unsigned long end_pfn;
-	int nid;
-} init_node_data[MAX_REGIONS] __initdata;
-
-int __init early_pfn_to_nid(unsigned long pfn)
-{
-	unsigned int i;
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		unsigned long start_pfn = init_node_data[i].start_pfn;
-		unsigned long end_pfn = init_node_data[i].end_pfn;
-
-		if ((start_pfn <= pfn) && (pfn < end_pfn))
-			return init_node_data[i].nid;
-	}
-
-	return -1;
-}
-
-void __init add_region(unsigned int nid, unsigned long start_pfn,
-		       unsigned long pages)
-{
-	unsigned int i;
-
-	dbg("add_region nid %d start_pfn 0x%lx pages 0x%lx\n",
-		nid, start_pfn, pages);
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		if (init_node_data[i].nid != nid)
-			continue;
-		if (init_node_data[i].end_pfn == start_pfn) {
-			init_node_data[i].end_pfn += pages;
-			return;
-		}
-		if (init_node_data[i].start_pfn == (start_pfn + pages)) {
-			init_node_data[i].start_pfn -= pages;
-			return;
-		}
-	}
-
-	/*
-	 * Leave last entry NULL so we dont iterate off the end (we use
-	 * entry.end_pfn to terminate the walk).
-	 */
-	if (i >= (MAX_REGIONS - 1)) {
-		printk(KERN_ERR "WARNING: too many memory regions in "
-				"numa code, truncating\n");
-		return;
-	}
-
-	init_node_data[i].start_pfn = start_pfn;
-	init_node_data[i].end_pfn = start_pfn + pages;
-	init_node_data[i].nid = nid;
-}
-
-/* We assume init_node_data has no overlapping regions */
-void __init get_region(unsigned int nid, unsigned long *start_pfn,
-		       unsigned long *end_pfn, unsigned long *pages_present)
-{
-	unsigned int i;
-
-	*start_pfn = -1UL;
-	*end_pfn = *pages_present = 0;
-
-	for (i = 0; init_node_data[i].end_pfn; i++) {
-		if (init_node_data[i].nid != nid)
-			continue;
-
-		*pages_present += init_node_data[i].end_pfn -
-			init_node_data[i].start_pfn;
-
-		if (init_node_data[i].start_pfn < *start_pfn)
-			*start_pfn = init_node_data[i].start_pfn;
-
-		if (init_node_data[i].end_pfn > *end_pfn)
-			*end_pfn = init_node_data[i].end_pfn;
-	}
-
-	/* We didnt find a matching region, return start/end as 0 */
-	if (*start_pfn == -1UL)
-		*start_pfn = 0;
-}
-
 static void __cpuinit map_cpu_to_node(int cpu, int node)
 {
 	numa_cpu_lookup_table[cpu] = node;
@@ -449,8 +359,8 @@ new_range:
 				continue;
 		}
 
-		add_region(nid, start >> PAGE_SHIFT,
-			   size >> PAGE_SHIFT);
+		add_active_range(nid, start >> PAGE_SHIFT,
+				(start >> PAGE_SHIFT) + (size >> PAGE_SHIFT));
 
 		if (--ranges)
 			goto new_range;
@@ -463,6 +373,7 @@ static void __init setup_nonnuma(void)
 {
 	unsigned long top_of_ram = lmb_end_of_DRAM();
 	unsigned long total_ram = lmb_phys_mem_size();
+	unsigned long start_pfn, end_pfn;
 	unsigned int i;
 
 	printk(KERN_INFO "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
@@ -470,9 +381,11 @@ static void __init setup_nonnuma(void)
 	printk(KERN_INFO "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
 
-	for (i = 0; i < lmb.memory.cnt; ++i)
-		add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT,
-			   lmb_size_pages(&lmb.memory, i));
+	for (i = 0; i < lmb.memory.cnt; ++i) {
+		start_pfn = lmb.memory.region[i].base >> PAGE_SHIFT;
+		end_pfn = start_pfn + lmb_size_pages(&lmb.memory, i);
+		add_active_range(0, start_pfn, end_pfn);
+	}
 	node_set_online(0);
 }
 
@@ -610,11 +523,11 @@ void __init do_init_bootmem(void)
 			  (void *)(unsigned long)boot_cpuid);
 
 	for_each_online_node(nid) {
-		unsigned long start_pfn, end_pfn, pages_present;
+		unsigned long start_pfn, end_pfn;
 		unsigned long bootmem_paddr;
 		unsigned long bootmap_pages;
 
-		get_region(nid, &start_pfn, &end_pfn, &pages_present);
+		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
 
 		/* Allocate the node structure node local if possible */
 		NODE_DATA(nid) = careful_allocation(nid,
@@ -647,19 +560,7 @@ void __init do_init_bootmem(void)
 		init_bootmem_node(NODE_DATA(nid), bootmem_paddr >> PAGE_SHIFT,
 				  start_pfn, end_pfn);
 
-		/* Add free regions on this node */
-		for (i = 0; init_node_data[i].end_pfn; i++) {
-			unsigned long start, end;
-
-			if (init_node_data[i].nid != nid)
-				continue;
-
-			start = init_node_data[i].start_pfn << PAGE_SHIFT;
-			end = init_node_data[i].end_pfn << PAGE_SHIFT;
-
-			dbg("free_bootmem %lx %lx\n", start, end - start);
-  			free_bootmem_node(NODE_DATA(nid), start, end - start);
-		}
+		free_bootmem_with_active_regions(nid, end_pfn);
 
 		/* Mark reserved regions on this node */
 		for (i = 0; i < lmb.reserved.cnt; i++) {
@@ -690,44 +591,14 @@ void __init do_init_bootmem(void)
 			}
 		}
 
-		/* Add regions into sparsemem */
-		for (i = 0; init_node_data[i].end_pfn; i++) {
-			unsigned long start, end;
-
-			if (init_node_data[i].nid != nid)
-				continue;
-
-			start = init_node_data[i].start_pfn;
-			end = init_node_data[i].end_pfn;
-
-			memory_present(nid, start, end);
-		}
+		sparse_memory_present_with_active_regions(nid);
 	}
 }
 
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES];
-	unsigned long zholes_size[MAX_NR_ZONES];
-	int nid;
-
-	memset(zones_size, 0, sizeof(zones_size));
-	memset(zholes_size, 0, sizeof(zholes_size));
-
-	for_each_online_node(nid) {
-		unsigned long start_pfn, end_pfn, pages_present;
-
-		get_region(nid, &start_pfn, &end_pfn, &pages_present);
-
-		zones_size[ZONE_DMA] = end_pfn - start_pfn;
-		zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - pages_present;
-
-		dbg("free_area_init node %d %lx %lx (hole: %lx)\n", nid,
-		    zones_size[ZONE_DMA], start_pfn, zholes_size[ZONE_DMA]);
-
-		free_area_init_node(nid, NODE_DATA(nid), zones_size, start_pfn,
-				    zholes_size);
-	}
+	unsigned long end_pfn = lmb_end_of_DRAM() >> PAGE_SHIFT;
+	free_area_init_nodes(end_pfn, end_pfn, end_pfn, end_pfn);
 }
 
 static int __init early_numa(char *p)
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/ppc/Kconfig linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/ppc/Kconfig
--- linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/ppc/Kconfig	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/ppc/Kconfig	2006-04-24 15:50:11.000000000 +0100
@@ -949,6 +949,9 @@ config NR_CPUS
 config HIGHMEM
 	bool "High memory support"
 
+config ARCH_POPULATES_NODE_MAP
+	def_bool y
+
 source kernel/Kconfig.hz
 source kernel/Kconfig.preempt
 source "mm/Kconfig"
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/ppc/mm/init.c linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/ppc/mm/init.c
--- linux-2.6.17-rc2-101-add_free_area_init_nodes/arch/ppc/mm/init.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-102-powerpc_use_init_nodes/arch/ppc/mm/init.c	2006-04-24 15:50:11.000000000 +0100
@@ -359,8 +359,7 @@ void __init do_init_bootmem(void)
  */
 void __init paging_init(void)
 {
-	unsigned long zones_size[MAX_NR_ZONES], i;
-
+	unsigned long start_pfn, end_pfn;
 #ifdef CONFIG_HIGHMEM
 	map_page(PKMAP_BASE, 0, 0);	/* XXX gross */
 	pkmap_page_table = pte_offset_kernel(pmd_offset(pgd_offset_k
@@ -370,19 +369,22 @@ void __init paging_init(void)
 			(KMAP_FIX_BEGIN), KMAP_FIX_BEGIN), KMAP_FIX_BEGIN);
 	kmap_prot = PAGE_KERNEL;
 #endif /* CONFIG_HIGHMEM */
-
-	/*
-	 * All pages are DMA-able so we put them all in the DMA zone.
-	 */
-	zones_size[ZONE_DMA] = total_lowmem >> PAGE_SHIFT;
-	for (i = 1; i < MAX_NR_ZONES; i++)
-		zones_size[i] = 0;
+	/* All pages are DMA-able so we put them all in the DMA zone. */
+	start_pfn = __pa(PAGE_OFFSET) >> PAGE_SHIFT;
+	end_pfn = start_pfn + (total_memory >> PAGE_SHIFT);
+	add_active_range(0, start_pfn, end_pfn);
 
 #ifdef CONFIG_HIGHMEM
-	zones_size[ZONE_HIGHMEM] = (total_memory - total_lowmem) >> PAGE_SHIFT;
+	free_area_init_nodes(total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_lowmem >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT);
+#else
+	free_area_init_nodes(total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT,
+				total_memory >> PAGE_SHIFT);
 #endif /* CONFIG_HIGHMEM */
-
-	free_area_init(zones_size);
 }
 
 void __init mem_init(void)

^ permalink raw reply

* [PATCH 1/7] Introduce mechanism for registering active regions of memory
From: Mel Gorman @ 2006-04-24 20:20 UTC (permalink / raw)
  To: davej, tony.luck, linux-mm, ak, bob.picco, linux-kernel,
	linuxppc-dev
  Cc: Mel Gorman
In-Reply-To: <20060424202009.20409.89016.sendpatchset@skynet>


This patch defines the structure to represent an active range of page
frames within a node in an architecture independent manner. Architectures
are expected to register active ranges of PFNs using add_active_range(nid,
start_pfn, end_pfn) and call free_area_init_nodes() passing the PFNs of
the end of each zone.


 include/linux/mm.h     |   19 ++
 include/linux/mmzone.h |   15 +
 mm/page_alloc.c        |  395 +++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 404 insertions(+), 25 deletions(-)

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-clean/include/linux/mm.h linux-2.6.17-rc2-101-add_free_area_init_nodes/include/linux/mm.h
--- linux-2.6.17-rc2-clean/include/linux/mm.h	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-101-add_free_area_init_nodes/include/linux/mm.h	2006-04-24 15:49:25.000000000 +0100
@@ -866,6 +866,25 @@ extern void free_area_init(unsigned long
 extern void free_area_init_node(int nid, pg_data_t *pgdat,
 	unsigned long * zones_size, unsigned long zone_start_pfn, 
 	unsigned long *zholes_size);
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+extern void free_area_init_nodes(unsigned long max_dma_pfn,
+					unsigned long max_dma32_pfn,
+					unsigned long max_low_pfn,
+					unsigned long max_high_pfn);
+extern void add_active_range(unsigned int nid, unsigned long start_pfn,
+					unsigned long end_pfn);
+extern void remove_all_active_ranges(void);
+extern void get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn);
+extern unsigned long find_min_pfn_with_active_regions(void);
+extern unsigned long find_max_pfn_with_active_regions(void);
+extern int early_pfn_to_nid(unsigned long pfn);
+extern void free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn);
+extern void sparse_memory_present_with_active_regions(int nid);
+extern unsigned long absent_pages_in_range(unsigned long start_pfn,
+						unsigned long end_pfn);
+#endif
 extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-clean/include/linux/mmzone.h linux-2.6.17-rc2-101-add_free_area_init_nodes/include/linux/mmzone.h
--- linux-2.6.17-rc2-clean/include/linux/mmzone.h	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-101-add_free_area_init_nodes/include/linux/mmzone.h	2006-04-24 15:49:25.000000000 +0100
@@ -271,6 +271,18 @@ struct zonelist {
 	struct zone *zones[MAX_NUMNODES * MAX_NR_ZONES + 1]; // NULL delimited
 };
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/*
+ * This represents an active range of physical memory. Architectures register
+ * a pfn range using add_active_range() and later initialise the nodes and
+ * free list with free_area_init_nodes()
+ */
+struct node_active_region {
+	unsigned long start_pfn;
+	unsigned long end_pfn;
+	int nid;
+};
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
 /*
  * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
@@ -465,7 +477,8 @@ extern struct zone *next_zone(struct zon
 
 #endif
 
-#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+#if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \
+	!defined(CONFIG_ARCH_POPULATES_NODE_MAP)
 #define early_pfn_to_nid(nid)  (0UL)
 #endif
 
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc2-clean/mm/page_alloc.c linux-2.6.17-rc2-101-add_free_area_init_nodes/mm/page_alloc.c
--- linux-2.6.17-rc2-clean/mm/page_alloc.c	2006-04-19 04:00:49.000000000 +0100
+++ linux-2.6.17-rc2-101-add_free_area_init_nodes/mm/page_alloc.c	2006-04-24 15:49:25.000000000 +0100
@@ -37,6 +37,8 @@
 #include <linux/nodemask.h>
 #include <linux/vmalloc.h>
 #include <linux/mempolicy.h>
+#include <linux/sort.h>
+#include <linux/pfn.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -85,6 +87,18 @@ int min_free_kbytes = 1024;
 unsigned long __initdata nr_kernel_pages;
 unsigned long __initdata nr_all_pages;
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+  #ifdef CONFIG_MAX_ACTIVE_REGIONS
+    #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS
+  #else
+    #define MAX_ACTIVE_REGIONS (MAX_NR_ZONES * MAX_NUMNODES + 1)
+  #endif
+
+  struct node_active_region __initdata early_node_map[MAX_ACTIVE_REGIONS];
+  unsigned long __initdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
+  unsigned long __initdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
@@ -1749,25 +1763,6 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
-static void __init calculate_zone_totalpages(struct pglist_data *pgdat,
-		unsigned long *zones_size, unsigned long *zholes_size)
-{
-	unsigned long realtotalpages, totalpages = 0;
-	int i;
-
-	for (i = 0; i < MAX_NR_ZONES; i++)
-		totalpages += zones_size[i];
-	pgdat->node_spanned_pages = totalpages;
-
-	realtotalpages = totalpages;
-	if (zholes_size)
-		for (i = 0; i < MAX_NR_ZONES; i++)
-			realtotalpages -= zholes_size[i];
-	pgdat->node_present_pages = realtotalpages;
-	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
-}
-
-
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2054,6 +2049,221 @@ static __meminit void init_currently_emp
 	zone_init_free_lists(pgdat, zone, zone->spanned_pages);
 }
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+/* Note: nid == MAX_NUMNODES returns first region */
+static int __init first_active_region_index_in_nid(int nid)
+{
+	int i;
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		if (nid == MAX_NUMNODES || early_node_map[i].nid == nid)
+			return i;
+	}
+
+	return MAX_ACTIVE_REGIONS;
+}
+
+/* Note: nid == MAX_NUMNODES returns next region */
+static int __init next_active_region_index_in_nid(unsigned int index, int nid)
+{
+	for (index = index + 1; early_node_map[index].end_pfn; index++) {
+		if (nid == MAX_NUMNODES || early_node_map[index].nid == nid)
+			return index;
+	}
+
+	return MAX_ACTIVE_REGIONS;
+}
+
+#ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
+int __init early_pfn_to_nid(unsigned long pfn)
+{
+	int i;
+
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		unsigned long start_pfn = early_node_map[i].start_pfn;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+
+		if ((start_pfn <= pfn) && (pfn < end_pfn))
+			return early_node_map[i].nid;
+	}
+
+	return -1;
+}
+#endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
+
+#define for_each_active_range_index_in_nid(i, nid) \
+	for (i = first_active_region_index_in_nid(nid); \
+				i != MAX_ACTIVE_REGIONS; \
+				i = next_active_region_index_in_nid(i, nid))
+
+void __init free_bootmem_with_active_regions(int nid,
+						unsigned long max_low_pfn)
+{
+	unsigned int i;
+	for_each_active_range_index_in_nid(i, nid) {
+		unsigned long size_pages = 0;
+		unsigned long end_pfn = early_node_map[i].end_pfn;
+		if (early_node_map[i].start_pfn >= max_low_pfn)
+			continue;
+
+		if (end_pfn > max_low_pfn)
+			end_pfn = max_low_pfn;
+
+		size_pages = end_pfn - early_node_map[i].start_pfn;
+		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
+				PFN_PHYS(early_node_map[i].start_pfn),
+				PFN_PHYS(size_pages));
+	}
+}
+
+void __init sparse_memory_present_with_active_regions(int nid)
+{
+	unsigned int i;
+	for_each_active_range_index_in_nid(i, nid)
+		memory_present(early_node_map[i].nid,
+				early_node_map[i].start_pfn,
+				early_node_map[i].end_pfn);
+}
+
+void __init get_pfn_range_for_nid(unsigned int nid,
+			unsigned long *start_pfn, unsigned long *end_pfn)
+{
+	unsigned int i;
+	*start_pfn = -1UL;
+	*end_pfn = 0;
+
+	for_each_active_range_index_in_nid(i, nid) {
+		if (early_node_map[i].start_pfn < *start_pfn)
+			*start_pfn = early_node_map[i].start_pfn;
+
+		if (early_node_map[i].end_pfn > *end_pfn)
+			*end_pfn = early_node_map[i].end_pfn;
+	}
+
+	if (*start_pfn == -1UL) {
+		printk(KERN_WARNING "Node %u active with no memory\n", nid);
+		*start_pfn = 0;
+	}
+}
+
+unsigned long __init zone_present_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	unsigned long node_start_pfn, node_end_pfn;
+	unsigned long zone_start_pfn, zone_end_pfn;
+
+	/* Get the start and end of the node and zone */
+	get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);
+	zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];
+	zone_end_pfn = arch_zone_highest_possible_pfn[zone_type];
+
+	/* Check that this node has pages within the zone's required range */
+	if (zone_end_pfn < node_start_pfn || zone_start_pfn > node_end_pfn)
+		return 0;
+
+	/* Move the zone boundaries inside the node if necessary */
+	if (zone_end_pfn > node_end_pfn)
+		zone_end_pfn = node_end_pfn;
+	if (zone_start_pfn < node_start_pfn)
+		zone_start_pfn = node_start_pfn;
+
+	/* Return the spanned pages */
+	return zone_end_pfn - zone_start_pfn;
+}
+
+unsigned long __init __absent_pages_in_range(int nid,
+				unsigned long range_start_pfn,
+				unsigned long range_end_pfn)
+{
+	int i = 0;
+	unsigned long prev_end_pfn = 0, hole_pages = 0;
+	unsigned long start_pfn;
+
+	/* Find the end_pfn of the first active range of pfns in the node */
+	i = first_active_region_index_in_nid(nid);
+	prev_end_pfn = early_node_map[i].start_pfn;
+
+	/* Find all holes for the zone within the node */
+	for (; i != MAX_ACTIVE_REGIONS;
+			i = next_active_region_index_in_nid(i, nid)) {
+
+		/* No need to continue if prev_end_pfn is outside the zone */
+		if (prev_end_pfn >= range_end_pfn)
+			break;
+
+		/* Make sure the end of the zone is not within the hole */
+		start_pfn = early_node_map[i].start_pfn;
+		if (start_pfn > range_end_pfn)
+			start_pfn = range_end_pfn;
+		if (prev_end_pfn < range_start_pfn)
+			prev_end_pfn = range_start_pfn;
+
+		/* Update the hole size cound and move on */
+		if (start_pfn > range_start_pfn) {
+			BUG_ON(prev_end_pfn > start_pfn);
+			hole_pages += start_pfn - prev_end_pfn;
+		}
+		prev_end_pfn = early_node_map[i].end_pfn;
+	}
+
+	return hole_pages;
+}
+
+unsigned long __init absent_pages_in_range(unsigned long start_pfn,
+							unsigned long end_pfn)
+{
+	return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn);
+}
+
+unsigned long __init zone_absent_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *ignored)
+{
+	return __absent_pages_in_range(nid,
+				arch_zone_lowest_possible_pfn[zone_type],
+				arch_zone_highest_possible_pfn[zone_type]);
+}
+#else
+static inline unsigned long zone_present_pages_in_node(int nid,
+					unsigned long zone_type,
+					unsigned long *zones_size)
+{
+	return zones_size[zone_type];
+}
+
+static inline unsigned long zone_absent_pages_in_node(int nid,
+						unsigned long zone_type,
+						unsigned long *zholes_size)
+{
+	if (!zholes_size)
+		return 0;
+
+	return zholes_size[zone_type];
+}
+#endif
+
+static void __init calculate_node_totalpages(struct pglist_data *pgdat,
+		unsigned long *zones_size, unsigned long *zholes_size)
+{
+	unsigned long realtotalpages, totalpages = 0;
+	int i;
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		totalpages += zone_present_pages_in_node(pgdat->node_id, i,
+								zones_size);
+	}
+	pgdat->node_spanned_pages = totalpages;
+
+	realtotalpages = totalpages;
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		realtotalpages -=
+			zone_absent_pages_in_node(pgdat->node_id, i, zholes_size);
+	}
+	pgdat->node_present_pages = realtotalpages;
+	printk(KERN_DEBUG "On node %d totalpages: %lu\n", pgdat->node_id,
+							realtotalpages);
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -2076,10 +2286,9 @@ static void __init free_area_init_core(s
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize;
 
-		realsize = size = zones_size[j];
-		if (zholes_size)
-			realsize -= zholes_size[j];
-
+		size = zone_present_pages_in_node(nid, j, zones_size);
+		realsize = size - zone_absent_pages_in_node(nid, j,
+								zholes_size);
 		if (j < ZONE_HIGHMEM)
 			nr_kernel_pages += realsize;
 		nr_all_pages += realsize;
@@ -2146,13 +2355,151 @@ void __init free_area_init_node(int nid,
 {
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
-	calculate_zone_totalpages(pgdat, zones_size, zholes_size);
+	calculate_node_totalpages(pgdat, zones_size, zholes_size);
 
 	alloc_node_mem_map(pgdat);
 
 	free_area_init_core(pgdat, zones_size, zholes_size);
 }
 
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+void __init add_active_range(unsigned int nid, unsigned long start_pfn,
+						unsigned long end_pfn)
+{
+	unsigned int i;
+
+	/* Merge with existing active regions if possible */
+	for (i = 0; early_node_map[i].end_pfn; i++) {
+		if (early_node_map[i].nid != nid)
+			continue;
+
+		/* Skip if an existing region covers this new one */
+		if (start_pfn >= early_node_map[i].start_pfn &&
+				end_pfn <= early_node_map[i].end_pfn)
+			return;
+
+		/* Merge forward if suitable */
+		if (start_pfn <= early_node_map[i].end_pfn &&
+				end_pfn > early_node_map[i].end_pfn) {
+			early_node_map[i].end_pfn = end_pfn;
+			return;
+		}
+
+		/* Merge backward if suitable */
+		if (start_pfn < early_node_map[i].end_pfn &&
+				end_pfn >= early_node_map[i].start_pfn) {
+			early_node_map[i].start_pfn = start_pfn;
+			return;
+		}
+	}
+
+	/* Leave last entry NULL, we use range.end_pfn to terminate the walk */
+	if (i >= MAX_ACTIVE_REGIONS - 1) {
+		printk(KERN_ERR "Too many memory regions, truncating\n");
+		return;
+	}
+
+	early_node_map[i].nid = nid;
+	early_node_map[i].start_pfn = start_pfn;
+	early_node_map[i].end_pfn = end_pfn;
+}
+
+void __init remove_all_active_ranges()
+{
+	memset(early_node_map, 0, sizeof(early_node_map));
+}
+
+/* Compare two active node_active_regions */
+static int __init cmp_node_active_region(const void *a, const void *b)
+{
+	struct node_active_region *arange = (struct node_active_region *)a;
+	struct node_active_region *brange = (struct node_active_region *)b;
+
+	/* Done this way to avoid overflows */
+	if (arange->start_pfn > brange->start_pfn)
+		return 1;
+	if (arange->start_pfn < brange->start_pfn)
+		return -1;
+
+	return 0;
+}
+
+/* sort the node_map by start_pfn */
+static void __init sort_node_map(void)
+{
+	size_t num = 0;
+	while (early_node_map[num].end_pfn)
+		num++;
+
+	sort(early_node_map, num, sizeof(struct node_active_region),
+						cmp_node_active_region, NULL);
+}
+
+/* Find the lowest pfn for a node. This depends on a sorted early_node_map */
+unsigned long __init find_min_pfn_for_node(unsigned long nid)
+{
+	int i;
+
+	/* Assuming a sorted map, the first range found has the starting pfn */
+	for_each_active_range_index_in_nid(i, nid)
+		return early_node_map[i].start_pfn;
+
+	/* nid does not exist in early_node_map */
+	printk(KERN_WARNING "Could not find start_pfn for node %lu\n", nid);
+	return 0;
+}
+
+unsigned long __init find_min_pfn_with_active_regions(void)
+{
+	return find_min_pfn_for_node(MAX_NUMNODES);
+}
+
+unsigned long __init find_max_pfn_with_active_regions(void)
+{
+	int i;
+	unsigned long max_pfn = 0;
+
+	for (i = 0; early_node_map[i].end_pfn; i++)
+		max_pfn = max(max_pfn, early_node_map[i].end_pfn);
+
+	return max_pfn;
+}
+
+void __init free_area_init_nodes(unsigned long arch_max_dma_pfn,
+				unsigned long arch_max_dma32_pfn,
+				unsigned long arch_max_low_pfn,
+				unsigned long arch_max_high_pfn)
+{
+	unsigned long nid;
+	int zone_index;
+
+	/* Record where the zone boundaries are */
+	memset(arch_zone_lowest_possible_pfn, 0,
+				sizeof(arch_zone_lowest_possible_pfn));
+	memset(arch_zone_highest_possible_pfn, 0,
+				sizeof(arch_zone_highest_possible_pfn));
+	arch_zone_lowest_possible_pfn[ZONE_DMA] =
+					find_min_pfn_with_active_regions();
+	arch_zone_highest_possible_pfn[ZONE_DMA] = arch_max_dma_pfn;
+	arch_zone_highest_possible_pfn[ZONE_DMA32] = arch_max_dma32_pfn;
+	arch_zone_highest_possible_pfn[ZONE_NORMAL] = arch_max_low_pfn;
+	arch_zone_highest_possible_pfn[ZONE_HIGHMEM] = arch_max_high_pfn;
+	for (zone_index = 1; zone_index < MAX_NR_ZONES; zone_index++) {
+		arch_zone_lowest_possible_pfn[zone_index] =
+			arch_zone_highest_possible_pfn[zone_index-1];
+	}
+
+	/* Regions in the early_node_map can be in any order */
+	sort_node_map();
+
+	for_each_online_node(nid) {
+		pg_data_t *pgdat = NODE_DATA(nid);
+		free_area_init_node(nid, pgdat, NULL,
+				find_min_pfn_for_node(nid), NULL);
+	}
+}
+#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
+
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 static bootmem_data_t contig_bootmem_data;
 struct pglist_data contig_page_data = { .bdata = &contig_bootmem_data };

^ permalink raw reply

* [PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V4
From: Mel Gorman @ 2006-04-24 20:20 UTC (permalink / raw)
  To: davej, tony.luck, linuxppc-dev, linux-kernel, bob.picco, ak,
	linux-mm
  Cc: Mel Gorman

This is V4 of the patchset to size zones and memory holes in an
architecture-independent manner.

After this release, I would like to look at getting this into the -mm
tree assuming there are no boot-related problems found before I would
also like to hear some feedback. Andi has already stated for release V2
that he didn't think it helped x86_64 much, so later releases made more
effort at reducing the x86_64-specific code.  Andi, are you still concerned
about the x86_64 changes? For the i386, ia64 and power maintainers, have
you concerns or do you think these are ready to get some wider testing?
For anyone else watching, code reviews and acks would be very welcome
because my own testing cannot cover every machine configuration.

The reasons why I'd like to this merged include;

 o Less architecture-specific code - particuarly for x86 and ppc64
 o More maintainable. Changes to zone layout need only be made in one place
 o Zone-sizing and memory hole calculation is one less job that needs to be
   done for new architecture ports
 o With the architecture-independent representation, zone-based
   anti-fragmentation needs a lot less architecture-specific code making it
   more portable between architectures. This will be important for future
   hugepage-availability work
 o Nigel Cunningham has stated that that software suspend could potentially
   use the architecture-independent representation to discover what pages
   need to be saved during suspend

Changelog since V3
o Rebase to 2.6.17-rc2
o Allow the active regions to be cleared. Needed by x86_64 when it decides
  the SRAT table is bad half way through the registering of active regions
o Fix for flatmem x86_64 machines booting

Changelog since V2
o Fix a bug where holes in lower zones get double counted
o Catch the case where a new range is registered that is within an range
o Catch the case where a zone boundary is within a hole
o Use the EFI map for registering ranges on x86_64+numa
o On IA64+NUMA, add the active ranges before rounding for granules
o On x86_64, remove e820_hole_size and e820_bootmem_free and use
  arch-independent equivalents
o On x86_64, remove the map walk in e820_end_of_ram()
o Rename memory_present_with_active_regions, name ambiguous
o Add absent_pages_in_range() for arches to call

Changelog since V1
o Correctly convert virtual and physical addresses to PFNs on ia64
o Correctly convert physical addresses to PFN on older ppc 
o When add_active_range() is called with overlapping pfn ranges, merge them
o When a zone boundary occurs within a memory hole, account correctly
o Minor whitespace damage cleanup
o Debugging patch temporarily included

At a basic level, architectures define structures to record where active
ranges of page frames are located. Once located, the code to calculate
zone sizes and holes in each architecture is very similar.  Some of this
zone and hole sizing code is difficult to read for no good reason. This
set of patches eliminates the similar-looking architecture-specific code.

The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range(). When all areas
have been discovered, free_area_init_nodes() is called to initialise
the pgdat and zones. The zone sizes and holes are then calculated in an
architecture independent manner.

Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 128 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 150 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 96 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 57 arch-specific LOC removed

At this point, there is a reduction of 431 architecture-specific lines of code
and a net reduction of 52 lines. The arch-independent code is a lot easier
to read in comparison to some of the arch-specific stuff, particularly in
arch/i386/ .

For Patch 6, it was also noted that page_alloc.c has a *lot* of
initialisation code which makes the file harder to read than it needs to
be. Patch 6 creates a new file mem_init.c and moves a lot of initialisation
code from page_alloc.c to it. After the patch is applied, there is still a net
loss of 33 lines.

The patches have been successfully boot tested by me and verified that the
zones are the correct size on

o x86, flatmem with 1.5GiB of RAM
o x86, NUMAQ
o PPC64, NUMA
o PPC64, CONFIG_NUMA=n
o Power, RS6000 with and without HIGHMEM enabled
o x86_64, NUMA with SRAT
o x86_64, NUMA with broken SRAT that falls back to k8topology discovery
o x86_64, ACPI_NUMA, ACPI_MEMORY_HOTPLUG && !SPARSEMEM to trigger the
  hotadd path without sparsemem fun in srat.c (SRAT broken on test machine and
  I'm pretty sure the machine does not support physical memory hotadd anyway
  so test may not have been effective other than being a compile test.)
o x86_64, CONFIG_NUMA=n
o x86_64, AMD64 desktop machine with flatmem

Tony Luck has succesfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig. Bob Picco has also tested and debugged
on IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine. These were on patches against 2.6.17-rc1 but there have been no
ia64-changes made between release 3 and 4 of these patches.

There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory
holes but the architecture-specific code accounts the memory as present.

The net reduction seems small but the big benefit of this set of patches
is the reduction of 431 lines of architecture-specific code, some of
which is very hairy. There should be a greater net reduction when other
architectures use the same mechanisms for zone and hole sizing but I lack
the hardware to test on.

Comments?

Additional credit;
	Dave Hansen for the initial suggestion and comments on early patches
	Andy Whitcroft for reviewing early versions and catching numerous errors
	Tony Luck for testing and debugging on IA64
	Bob Picco for testing and fixing bugs related to pfn registration
	Jack Steiner and Yasunori for testing on IA64

 arch/i386/Kconfig           |    8 
 arch/i386/kernel/setup.c    |   19 
 arch/i386/kernel/srat.c     |   98 ----
 arch/i386/mm/discontig.c    |   59 --
 arch/ia64/Kconfig           |    3 
 arch/ia64/mm/contig.c       |   60 --
 arch/ia64/mm/discontig.c    |   41 -
 arch/ia64/mm/init.c         |   12 
 arch/powerpc/Kconfig        |   13 
 arch/powerpc/mm/mem.c       |   53 --
 arch/powerpc/mm/numa.c      |  157 ------
 arch/ppc/Kconfig            |    3 
 arch/ppc/mm/init.c          |   26 -
 arch/x86_64/Kconfig         |    3 
 arch/x86_64/kernel/e820.c   |  109 +---
 arch/x86_64/kernel/setup.c  |    6 
 arch/x86_64/mm/init.c       |   62 --
 arch/x86_64/mm/k8topology.c |    3 
 arch/x86_64/mm/numa.c       |   18 
 arch/x86_64/mm/srat.c       |   10 
 include/asm-ia64/meminit.h  |    1 
 include/asm-x86_64/e820.h   |    5 
 include/asm-x86_64/proto.h  |    2 
 include/linux/mm.h          |   19 
 include/linux/mmzone.h      |   15 
 mm/Makefile                 |    2 
 mm/mem_init.c               | 1044 ++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |  678 ----------------------------
 28 files changed, 1248 insertions(+), 1281 deletions(-)
-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
From: Mark A. Greer @ 2006-04-24 19:21 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc
In-Reply-To: <1468.1145570898@www002.gmx.net>

On Fri, Apr 21, 2006 at 12:08:18AM +0200, Gerhard Pircher wrote:
> > --- Ursprüngliche Nachricht ---
> > Von: Eugene Surovegin <ebs@ebshome.net>
> > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> > Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> > Datum: Thu, 20 Apr 2006 14:55:14 -0700
> > 
> > Well, you aren't the first person who tries to run G4 with 
> > CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
> > that those people had to implement anything as complex as you are 
> > trying to do.
> 
> Maybe these systems have cache coherent northbridges, which is not the case
> for the AmigaOne and its "famous" ArticiaS northbridge.

Nope.  I believe Eugene is referring to the Marvell 64x60 line of *North*
bridges.

> > You can try asking on #mklinux. It always better to ask people who 
> > actually _did_ this :).
> > 
> > In fact, I just grepped 2.6 and found 
> > #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> > systems usually have this type of bridge? Not 4xx/8xx, that's for sure.
> 
> Hmm, strange. AFAIK the NOT_COHERENT_CACHE config option is available only
> for the 4xx and 8xx platforms. Wouldn't the config option depend on
> CONFIG_6XX too, if there are not cache coherent systems with G4 cpus? 
> 
> At least I could not compile in the dma-mapping.c file without modifying the
> Kconfig file.

If you're looking in arch/ppc/Kconfig in either the kernel.org or paulus'
git trees, look further down.  There is a separate option where
NOT_COHERENT_CACHE can be set for 64x60 bridges.

Mark

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox