* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 [not found] <20060205163618.GB21972@in.ibm.com> @ 2006-02-05 17:03 ` Andi Kleen 2006-02-06 16:11 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-05 17:03 UTC (permalink / raw) To: discuss, bharata; +Cc: linux-kernel, Christoph Lameter On Sunday 05 February 2006 17:36, Bharata B Rao wrote: > Hi, > > I am seeing a kernel crash with 2.6.16-rc1 and rc2 but not on any > 2.6.15 kernels (rc and 2.6.15.2). Arch is x86_64. > > The kernel crashes when I run an application which does: > - mmap (0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) > - mbind the memory to the 1st node with policy MPOL_BIND > - write to that memory > > The crash time log on 2.6.16-rc2 looks like this: > > Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: > <ffffffff801614df>{__rmqueue+63} There's another report of it. The boot logs seem ok, so I guess mbind broke somehow. I suppose it's related to the mempolicy changes that went into 2.6.16-rc1. I'll try to take a look tomorrow if Christoph doesn't beat it. OOM with mbind seems to have broken also - it oopses too. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-05 17:03 ` [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 Andi Kleen @ 2006-02-06 16:11 ` Christoph Lameter 2006-02-06 18:12 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-06 16:11 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, bharata, linux-kernel On Sun, 5 Feb 2006, Andi Kleen wrote: > > The kernel crashes when I run an application which does: > > - mmap (0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) > > - mbind the memory to the 1st node with policy MPOL_BIND > > - write to that memory Tried the following code on rc1 and rc2 and it worked fine on ia64: #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <numaif.h> int main(int argc, void *argv[]) { char *p; unsigned long nodes = 0x01; p = mmap(0, 32768, PROT_READ| PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, 0, 0); mbind(p, 32768, MPOL_BIND, &nodes, 64, 0); p[34] = 89; return 0; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 16:11 ` Christoph Lameter @ 2006-02-06 18:12 ` Andi Kleen 2006-02-06 18:25 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-06 18:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: discuss, bharata, linux-kernel On Monday 06 February 2006 17:11, Christoph Lameter wrote: > On Sun, 5 Feb 2006, Andi Kleen wrote: > > > > The kernel crashes when I run an application which does: > > > - mmap (0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) > > > - mbind the memory to the 1st node with policy MPOL_BIND > > > - write to that memory > > Tried the following code on rc1 and rc2 and it worked fine on ia64: Perhaps it depends on if the node has enough memory free or not? I assume if the zonelist has some issue but the first entry is ok it will only cause problems when the allocation has to go off node (it shouldn't actually go off node with that policy of course, but with a full free local node that code path is never triggered) -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:12 ` Andi Kleen @ 2006-02-06 18:25 ` Christoph Lameter 2006-02-06 18:31 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-06 18:25 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, bharata, linux-kernel On Mon, 6 Feb 2006, Andi Kleen wrote: > > Tried the following code on rc1 and rc2 and it worked fine on ia64: > > Perhaps it depends on if the node has enough memory free or not? > I assume if the zonelist has some issue but the first entry is ok > it will only cause problems when the allocation has to go off node > (it shouldn't actually go off node with that policy of course, If node 0 is exhausted then you have an OOM situation. > but with a full free local node that code path is never triggered) Wamt me to test the OOM path for mbind? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:25 ` Christoph Lameter @ 2006-02-06 18:31 ` Andi Kleen 2006-02-06 18:45 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-06 18:31 UTC (permalink / raw) To: Christoph Lameter; +Cc: discuss, bharata, linux-kernel On Monday 06 February 2006 19:25, Christoph Lameter wrote: > On Mon, 6 Feb 2006, Andi Kleen wrote: > > > > Tried the following code on rc1 and rc2 and it worked fine on ia64: > > > > Perhaps it depends on if the node has enough memory free or not? > > I assume if the zonelist has some issue but the first entry is ok > > it will only cause problems when the allocation has to go off node > > (it shouldn't actually go off node with that policy of course, > > If node 0 is exhausted then you have an OOM situation. No - it could just need to free some cleanable pages first. That's a long way before going OOM. > > but with a full free local node that code path is never triggered) > > Wamt me to test the OOM path for mbind? I already know it oopses - someone else reported that. If you feel motivated feel free to fix. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:31 ` Andi Kleen @ 2006-02-06 18:45 ` Christoph Lameter 2006-02-06 18:55 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-06 18:45 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, bharata, linux-kernel On Mon, 6 Feb 2006, Andi Kleen wrote: > > If node 0 is exhausted then you have an OOM situation. > > No - it could just need to free some cleanable pages first. That's > a long way before going OOM. Then node 0 still has memory available. So you suspect zone_reclaim? > > > but with a full free local node that code path is never triggered) > > > > Wamt me to test the OOM path for mbind? > I already know it oopses - someone else reported that. If you feel > motivated feel free to fix. We also have a minor issue with huge pages. If the pools are exhausted then the kernel will terminate the application with Bus Error. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:45 ` Christoph Lameter @ 2006-02-06 18:55 ` Andi Kleen 2006-02-06 19:22 ` Christoph Lameter 2006-02-07 5:59 ` Bharata B Rao 0 siblings, 2 replies; 29+ messages in thread From: Andi Kleen @ 2006-02-06 18:55 UTC (permalink / raw) To: Christoph Lameter; +Cc: discuss, bharata, linux-kernel On Monday 06 February 2006 19:45, Christoph Lameter wrote: > On Mon, 6 Feb 2006, Andi Kleen wrote: > > > > If node 0 is exhausted then you have an OOM situation. > > > > No - it could just need to free some cleanable pages first. That's > > a long way before going OOM. > > Then node 0 still has memory available. So you suspect zone_reclaim? Either zone reclaim or the first entry in the zonelist is ok, but it's not correctly terminated or something like that so it causes problems when the kernel looks for the second (just speculating here, i don't know if that is the problem) > > > > but with a full free local node that code path is never triggered) > > > > > > Wamt me to test the OOM path for mbind? > > I already know it oopses - someone else reported that. If you feel > > motivated feel free to fix. > > We also have a minor issue with huge pages. If the pools are exhausted > then the kernel will terminate the application with Bus Error. That is what prereservation was supposed to prevent. I remember there were endless discussions when this all was originally implemented long ago (in the version that never got merged). Basically there were two approaches: - Do strict overcommit checking at mmap with prereservation (that was what the old Intel/SGI patch did) - The hackish way I implemented in SLES9: just check at mmap time if there are enough pages but don't prereserve anything. That was more a 80% solution with races, but seemed to fix the problem well enough that people in the field didn't really complain. The advantage was that it was much simpler code. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:55 ` Andi Kleen @ 2006-02-06 19:22 ` Christoph Lameter 2006-02-07 5:59 ` Bharata B Rao 1 sibling, 0 replies; 29+ messages in thread From: Christoph Lameter @ 2006-02-06 19:22 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, bharata, linux-kernel On Mon, 6 Feb 2006, Andi Kleen wrote: > That is what prereservation was supposed to prevent. I remember there > were endless discussions when this all was originally implemented long > ago (in the version that never got merged). But the reservation does not consider cpusets and memory policies right? It surely must fail if one restrict allocation to one node and then we run out of memory. That was the testcase that showed the Bus Error....\ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-06 18:55 ` Andi Kleen 2006-02-06 19:22 ` Christoph Lameter @ 2006-02-07 5:59 ` Bharata B Rao 2006-02-07 16:49 ` Christoph Lameter 1 sibling, 1 reply; 29+ messages in thread From: Bharata B Rao @ 2006-02-07 5:59 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Lameter, discuss, linux-kernel On Mon, Feb 06, 2006 at 07:55:18PM +0100, Andi Kleen wrote: > On Monday 06 February 2006 19:45, Christoph Lameter wrote: > > On Mon, 6 Feb 2006, Andi Kleen wrote: > > > > > > If node 0 is exhausted then you have an OOM situation. > > > > > > No - it could just need to free some cleanable pages first. That's > > > a long way before going OOM. > > > > Then node 0 still has memory available. So you suspect zone_reclaim? > > Either zone reclaim or the first entry in the zonelist is ok, but it's > not correctly terminated or something like that so it causes > problems when the kernel looks for the second (just speculating here, > i don't know if that is the problem) > I can still crash my x86_64 box with Christoph's program. The meminfo in my case looks like this just before I execute the program. llm07:~ # cat /sys/devices/system/node/node0/meminfo Node 0 MemTotal: 3095532 kB Node 0 MemFree: 2960972 kB Node 0 MemUsed: 134560 kB Node 0 Active: 19752 kB Node 0 Inactive: 14908 kB Node 0 HighTotal: 0 kB Node 0 HighFree: 0 kB Node 0 LowTotal: 3095532 kB Node 0 LowFree: 2960972 kB Node 0 Dirty: 0 kB Node 0 Writeback: 576 kB Node 0 Mapped: 0 kB Node 0 Slab: 24200 kB Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 llm07:~ # cat /sys/devices/system/node/node1/meminfo Node 1 MemTotal: 2002368 kB Node 1 MemFree: 1964464 kB Node 1 MemUsed: 37904 kB Node 1 Active: 10608 kB Node 1 Inactive: 3056 kB Node 1 HighTotal: 0 kB Node 1 HighFree: 0 kB Node 1 LowTotal: 2002368 kB Node 1 LowFree: 1964464 kB Node 1 Dirty: 1164 kB Node 1 Writeback: 0 kB Node 1 Mapped: 43064 kB Node 1 Slab: 9648 kB Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 I was trying to bind the memory to node 0, which still has enough free memory. Not sure if this helps, but I have some more debug data. While the kernel(2.6.16-rc1) oopes at page_alloc.c, line no: 556 (list_del(&page->lru), some of the variables in __rmqueue look like this at the time of crash: page = 0xffffffffffffffd8 &page->lru = 0000000000000000 zone = 0xffff81000000e700 zone->name Normal current_order 0 area->nr_free 0 Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-07 5:59 ` Bharata B Rao @ 2006-02-07 16:49 ` Christoph Lameter 2006-02-07 23:27 ` Ray Bryant 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-07 16:49 UTC (permalink / raw) To: Bharata B Rao; +Cc: Andi Kleen, discuss, linux-kernel On Tue, 7 Feb 2006, Bharata B Rao wrote: > I can still crash my x86_64 box with Christoph's program. So it looks like the problem is arch specific. Test program runs fine on ia64. > page = 0xffffffffffffffd8 > &page->lru = 0000000000000000 Yup lru field overwritten as I thought. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-07 16:49 ` Christoph Lameter @ 2006-02-07 23:27 ` Ray Bryant 2006-02-07 23:36 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Ray Bryant @ 2006-02-07 23:27 UTC (permalink / raw) To: Christoph Lameter; +Cc: Bharata B Rao, Andi Kleen, discuss, linux-kernel On Tuesday 07 February 2006 10:49, Christoph Lameter wrote: > On Tue, 7 Feb 2006, Bharata B Rao wrote: > > I can still crash my x86_64 box with Christoph's program. > > So it looks like the problem is arch specific. Test program runs fine on > ia64. > > > page = 0xffffffffffffffd8 > > &page->lru = 0000000000000000 > > Yup lru field overwritten as I thought. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ For what it is worth: Christoph's test program runs fine on my 32 GB, 4 socket, 8 core Opteron 64 box with 2.6.16-rc1. -- Ray Bryant AMD Performance Labs Austin, Tx 512-602-0038 (o) 512-507-7807 (c) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-07 23:27 ` Ray Bryant @ 2006-02-07 23:36 ` Andi Kleen 2006-02-08 12:10 ` Bharata B Rao 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-07 23:36 UTC (permalink / raw) To: Ray Bryant; +Cc: Christoph Lameter, Bharata B Rao, discuss, linux-kernel On Wednesday 08 February 2006 00:27, Ray Bryant wrote: > On Tuesday 07 February 2006 10:49, Christoph Lameter wrote: > > On Tue, 7 Feb 2006, Bharata B Rao wrote: > > > I can still crash my x86_64 box with Christoph's program. > > > > So it looks like the problem is arch specific. Test program runs fine on > > ia64. > > > > > page = 0xffffffffffffffd8 > > > &page->lru = 0000000000000000 > > > > Yup lru field overwritten as I thought. > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > For what it is worth: > > Christoph's test program runs fine on my 32 GB, 4 socket, 8 core Opteron 64 Opteron 64? A new exciting upcomming product? @) > box with 2.6.16-rc1. Yes it also works on my test box and also some other simple tests with MPOL_BIND. But we had similar reports on two different systems, so there's very likely a problem. Just need to reproduce it somehow. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-07 23:36 ` Andi Kleen @ 2006-02-08 12:10 ` Bharata B Rao 2006-02-08 15:42 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Bharata B Rao @ 2006-02-08 12:10 UTC (permalink / raw) To: Andi Kleen; +Cc: Ray Bryant, Christoph Lameter, discuss, linux-kernel On Wed, Feb 08, 2006 at 12:36:30AM +0100, Andi Kleen wrote: > On Wednesday 08 February 2006 00:27, Ray Bryant wrote: > > On Tuesday 07 February 2006 10:49, Christoph Lameter wrote: > > > On Tue, 7 Feb 2006, Bharata B Rao wrote: > > > > I can still crash my x86_64 box with Christoph's program. > > > > > > So it looks like the problem is arch specific. Test program runs fine on > > > ia64. > > > > > > > page = 0xffffffffffffffd8 > > > > &page->lru = 0000000000000000 > > > > > > Yup lru field overwritten as I thought. > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > > > For what it is worth: > > > > Christoph's test program runs fine on my 32 GB, 4 socket, 8 core Opteron 64 > > Opteron 64? A new exciting upcomming product? @) > > > box with 2.6.16-rc1. > > Yes it also works on my test box and also some other simple tests with MPOL_BIND. > But we had similar reports on two different systems, so there's very likely a problem. > Just need to reproduce it somehow. > I believe I understand why I am seeing this problem with my setup. The zones in my machine look like this: On node 0 totalpages: 773791 DMA zone: 2151 pages, LIFO batch:0 DMA32 zone: 771640 pages, LIFO batch:31 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 On node 1 totalpages: 500592 DMA zone: 0 pages, LIFO batch:0 DMA32 zone: 242032 pages, LIFO batch:31 Normal zone: 258560 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:0 So it can be seen that the node 0 has only DMA and DMA32 zones while node 1 has only DMA32 and Normal zones. The current mempolicy code assumes that the highest zone(policy_zone) that comes under the memory policy is valid (by which I mean zone->present_pages is non-zero) for all nodes, which is not true in my case. In this case the policy_zone gets set to ZONE_NORMAL (highest zone here). When mbind'ing to node 0, bind_zonelist()(and subsequent functions) binds the ZONE_NORMAL zone to vma->vm_policy. During the write fault, the allocator is asked to allocate from a non-existent ZONE_NORMAL zone for node 0. This I believe is causing the oops I am seeing. It is still not clear to me why doesn't the allocator fail the allocations from a zone which has zone->present_pages=0 gracefully. This whole problem wasn't seen on 2.6.15.2 because, bind_zonelist() actually makes sure that the zone it is binding to has a non-zero zone->present_pages. Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 12:10 ` Bharata B Rao @ 2006-02-08 15:42 ` Christoph Lameter 2006-02-08 15:45 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-08 15:42 UTC (permalink / raw) To: Bharata B Rao; +Cc: Andi Kleen, Ray Bryant, discuss, linux-kernel On Wed, 8 Feb 2006, Bharata B Rao wrote: > The zones in my machine look like this: > > On node 0 totalpages: 773791 > DMA zone: 2151 pages, LIFO batch:0 > DMA32 zone: 771640 pages, LIFO batch:31 > Normal zone: 0 pages, LIFO batch:0 > HighMem zone: 0 pages, LIFO batch:0 > On node 1 totalpages: 500592 > DMA zone: 0 pages, LIFO batch:0 > DMA32 zone: 242032 pages, LIFO batch:31 > Normal zone: 258560 pages, LIFO batch:31 > HighMem zone: 0 pages, LIFO batch:0 > > So it can be seen that the node 0 has only DMA and DMA32 zones while > node 1 has only DMA32 and Normal zones. Uhh... Thats a rather asymmetric arrangement. > The current mempolicy code assumes that the highest zone(policy_zone) that > comes under the memory policy is valid (by which I mean zone->present_pages > is non-zero) for all nodes, which is not true in my case. In this case > the policy_zone gets set to ZONE_NORMAL (highest zone here). Right. > When mbind'ing to node 0, bind_zonelist()(and subsequent functions) binds > the ZONE_NORMAL zone to vma->vm_policy. During the write fault, the allocator > is asked to allocate from a non-existent ZONE_NORMAL zone for node 0. This > I believe is causing the oops I am seeing. It is still not clear to me > why doesn't the allocator fail the allocations from a zone which has > zone->present_pages=0 gracefully. Hmm.... > This whole problem wasn't seen on 2.6.15.2 because, bind_zonelist() > actually makes sure that the zone it is binding to has a non-zero > zone->present_pages. Correct there was a loop in bind_zonelist that I moved to the zone initialization to simplify it. However, this has implications for policy_zone. This variable should store the zone that policies apply to. However, in your case this zone will vary which may lead to all sorts of weird behavior even if we fix bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 15:42 ` Christoph Lameter @ 2006-02-08 15:45 ` Andi Kleen 2006-02-08 15:59 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-08 15:45 UTC (permalink / raw) To: Christoph Lameter; +Cc: Bharata B Rao, Ray Bryant, discuss, linux-kernel On Wednesday 08 February 2006 16:42, Christoph Lameter wrote: > However, this has implications for policy_zone. This variable should store > the zone that policies apply to. However, in your case this zone will vary > which may lead to all sorts of weird behavior even if we fix > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? It really needs to apply to both (currently you can't police 4GB of your memory on x86-64) But I haven't worked out a good design how to implement it yet. -Andi > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 15:45 ` Andi Kleen @ 2006-02-08 15:59 ` Christoph Lameter 2006-02-08 16:06 ` Andi Kleen 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-08 15:59 UTC (permalink / raw) To: Andi Kleen; +Cc: Bharata B Rao, Ray Bryant, discuss, linux-kernel On Wed, 8 Feb 2006, Andi Kleen wrote: > On Wednesday 08 February 2006 16:42, Christoph Lameter wrote: > > > However, this has implications for policy_zone. This variable should store > > the zone that policies apply to. However, in your case this zone will vary > > which may lead to all sorts of weird behavior even if we fix > > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? > > It really needs to apply to both (currently you can't police 4GB of your > memory on x86-64) But I haven't worked out a good design how to implement it yet. So a provisional solution would be to simply ignore empty zones in bind_zonelist? Or fall back to earlier zones (which includes unpolicied zones in the bind zone list?) Index: linux-2.6.16-rc2/mm/mempolicy.c =================================================================== --- linux-2.6.16-rc2.orig/mm/mempolicy.c 2006-02-02 22:03:08.000000000 -0800 +++ linux-2.6.16-rc2/mm/mempolicy.c 2006-02-08 07:55:29.000000000 -0800 @@ -143,8 +143,12 @@ static struct zonelist *bind_zonelist(no if (!zl) return NULL; num = 0; - for_each_node_mask(nd, *nodes) - zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone]; + for_each_node_mask(nd, *nodes) { + struct zone *zone = &NODE_DATA(nd)->node_zones[policy_zone]; + + if (zone->present_pages) + zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone]; + } zl->zones[num] = NULL; return zl; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 15:59 ` Christoph Lameter @ 2006-02-08 16:06 ` Andi Kleen 2006-02-08 16:20 ` Christoph Lameter 2006-02-09 4:39 ` Bharata B Rao 0 siblings, 2 replies; 29+ messages in thread From: Andi Kleen @ 2006-02-08 16:06 UTC (permalink / raw) To: Christoph Lameter; +Cc: Bharata B Rao, Ray Bryant, discuss, linux-kernel On Wednesday 08 February 2006 16:59, Christoph Lameter wrote: > On Wed, 8 Feb 2006, Andi Kleen wrote: > > > On Wednesday 08 February 2006 16:42, Christoph Lameter wrote: > > > > > However, this has implications for policy_zone. This variable should store > > > the zone that policies apply to. However, in your case this zone will vary > > > which may lead to all sorts of weird behavior even if we fix > > > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? > > > > It really needs to apply to both (currently you can't police 4GB of your > > memory on x86-64) But I haven't worked out a good design how to implement it yet. > > So a provisional solution would be to simply ignore empty zones in > bind_zonelist? That would likely prevent the crash yes (Bharata can you test?) But of course it still has the problem of a lot of memory being unpolicied on machines with >4GB if there's both DMA32 and NORMAL. > Or fall back to earlier zones (which includes unpolicied > zones in the bind zone list?) Or that. Thanks, -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 16:06 ` Andi Kleen @ 2006-02-08 16:20 ` Christoph Lameter 2006-02-08 16:27 ` Andi Kleen 2006-02-09 4:39 ` Bharata B Rao 1 sibling, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-08 16:20 UTC (permalink / raw) To: Andi Kleen; +Cc: Bharata B Rao, Ray Bryant, discuss, linux-kernel On Wed, 8 Feb 2006, Andi Kleen wrote: > > So a provisional solution would be to simply ignore empty zones in > > bind_zonelist? > > That would likely prevent the crash yes (Bharata can you test?) > > But of course it still has the problem of a lot of memory being unpolicied > on machines with >4GB if there's both DMA32 and NORMAL. The fix could result in a zonelist with no zones. So we can answer one question in __alloc_pages(). Index: linux-2.6.16-rc2/mm/page_alloc.c =================================================================== --- linux-2.6.16-rc2.orig/mm/page_alloc.c 2006-02-08 00:05:09.000000000 -0800 +++ linux-2.6.16-rc2/mm/page_alloc.c 2006-02-08 08:18:59.000000000 -0800 @@ -913,7 +913,7 @@ restart: z = zonelist->zones; /* the list of zones suitable for gfp_mask */ if (unlikely(*z == NULL)) { - /* Should this ever happen?? */ + /* May occur if MPOL_BIND results in an empty zonelist */ return NULL; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 16:20 ` Christoph Lameter @ 2006-02-08 16:27 ` Andi Kleen 2006-02-08 16:51 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-08 16:27 UTC (permalink / raw) To: discuss; +Cc: Christoph Lameter, Bharata B Rao, Ray Bryant, linux-kernel On Wednesday 08 February 2006 17:20, Christoph Lameter wrote: > On Wed, 8 Feb 2006, Andi Kleen wrote: > > > > So a provisional solution would be to simply ignore empty zones in > > > bind_zonelist? > > > > That would likely prevent the crash yes (Bharata can you test?) > > > > But of course it still has the problem of a lot of memory being unpolicied > > on machines with >4GB if there's both DMA32 and NORMAL. > > The fix could result in a zonelist with no zones. So we can answer one > question in __alloc_pages(). I don't think it can happen - at least one zone <= policy-zone has to have memory otherwise the machine wouldn't work at all. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 16:27 ` Andi Kleen @ 2006-02-08 16:51 ` Christoph Lameter 0 siblings, 0 replies; 29+ messages in thread From: Christoph Lameter @ 2006-02-08 16:51 UTC (permalink / raw) To: Andi Kleen; +Cc: discuss, Bharata B Rao, Ray Bryant, linux-kernel On Wed, 8 Feb 2006, Andi Kleen wrote: > > The fix could result in a zonelist with no zones. So we can answer one > > question in __alloc_pages(). > > I don't think it can happen - at least one zone <= policy-zone has to > have memory otherwise the machine wouldn't work at all. One could bind to a nodeset that contains a single node. If that node has no memory in the policy zone then the zonelist generated by bind_zonelist will be empty. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-08 16:06 ` Andi Kleen 2006-02-08 16:20 ` Christoph Lameter @ 2006-02-09 4:39 ` Bharata B Rao 2006-02-09 9:58 ` Andi Kleen 1 sibling, 1 reply; 29+ messages in thread From: Bharata B Rao @ 2006-02-09 4:39 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Lameter, Ray Bryant, discuss, linux-kernel On Wed, Feb 08, 2006 at 05:06:26PM +0100, Andi Kleen wrote: > On Wednesday 08 February 2006 16:59, Christoph Lameter wrote: > > On Wed, 8 Feb 2006, Andi Kleen wrote: > > > > > On Wednesday 08 February 2006 16:42, Christoph Lameter wrote: > > > > > > > However, this has implications for policy_zone. This variable should store > > > > the zone that policies apply to. However, in your case this zone will vary > > > > which may lead to all sorts of weird behavior even if we fix > > > > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? > > > > > > It really needs to apply to both (currently you can't police 4GB of your > > > memory on x86-64) But I haven't worked out a good design how to implement it yet. > > > > So a provisional solution would be to simply ignore empty zones in > > bind_zonelist? > > That would likely prevent the crash yes (Bharata can you test?) With this solution, the kernel doesn't crash, but the application does. Shouldn't we fail mbind if we can't bind any zones ? Something like this... Signed-off-by: Bharata B Rao <bharata@in.ibm.com> --- linux-2.6.16-rc2/mm/mempolicy.c.orig 2006-02-09 01:34:37.000000000 -0800 +++ linux-2.6.16-rc2/mm/mempolicy.c 2006-02-09 01:39:32.000000000 -0800 @@ -143,8 +143,18 @@ if (!zl) return NULL; num = 0; - for_each_node_mask(nd, *nodes) - zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone]; + for_each_node_mask(nd, *nodes) { + struct zone *zone = &NODE_DATA(nd)->node_zones[policy_zone]; + + if (zone->present_pages) + zl->zones[num++] = zone; + } + + if (!num) { + /* failed to bind even a single zone */ + kfree(zl); + return NULL; + } zl->zones[num] = NULL; return zl; } > > But of course it still has the problem of a lot of memory being unpolicied > on machines with >4GB if there's both DMA32 and NORMAL. > > > Or fall back to earlier zones (which includes unpolicied > > zones in the bind zone list?) > Does it make sense to have a separate policy_zone for each node so that we have atleast one(highest) zone in a node which comes under memory policy ? Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-09 4:39 ` Bharata B Rao @ 2006-02-09 9:58 ` Andi Kleen 2006-02-14 19:33 ` Christoph Lameter 0 siblings, 1 reply; 29+ messages in thread From: Andi Kleen @ 2006-02-09 9:58 UTC (permalink / raw) To: bharata; +Cc: Christoph Lameter, Ray Bryant, discuss, linux-kernel On Thursday 09 February 2006 05:39, Bharata B Rao wrote: > On Wed, Feb 08, 2006 at 05:06:26PM +0100, Andi Kleen wrote: > > On Wednesday 08 February 2006 16:59, Christoph Lameter wrote: > > > On Wed, 8 Feb 2006, Andi Kleen wrote: > > > > > > > On Wednesday 08 February 2006 16:42, Christoph Lameter wrote: > > > > > > > > > However, this has implications for policy_zone. This variable should store > > > > > the zone that policies apply to. However, in your case this zone will vary > > > > > which may lead to all sorts of weird behavior even if we fix > > > > > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32? > > > > > > > > It really needs to apply to both (currently you can't police 4GB of your > > > > memory on x86-64) But I haven't worked out a good design how to implement it yet. > > > > > > So a provisional solution would be to simply ignore empty zones in > > > bind_zonelist? > > > > That would likely prevent the crash yes (Bharata can you test?) > > With this solution, the kernel doesn't crash, but the application does. > > Shouldn't we fail mbind if we can't bind any zones ? Really need to fix this properly to support both zones in mbind > Does it make sense to have a separate policy_zone for each node so that we > have atleast one(highest) zone in a node which comes under memory policy ? That wouldn't solve the problem. The problem is that the mempolicy needs at least two zonelists to handle all type of allocations (that is why i added the concept of policy zone in the first place - to avoid the need of multilevel zonelists in the policies) Or maybe it's better to just don't do any policy for GFP_DMA32 allocations and always use the highest zonelist. I guess they're somewhat rare anyways and the policy will rarely succeed. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-09 9:58 ` Andi Kleen @ 2006-02-14 19:33 ` Christoph Lameter 2006-02-15 5:46 ` Bharata B Rao 0 siblings, 1 reply; 29+ messages in thread From: Christoph Lameter @ 2006-02-14 19:33 UTC (permalink / raw) To: bharata; +Cc: Andi Kleen, Christoph Lameter, Ray Bryant, discuss, linux-kernel I just took another look at this issue and I cannot see anything wrong. An empty zone should be ignored by the page allocator since nr_free == 0. My patch should not be needed. Could you get us the contents of the struct zone that the page allocator is trying to get memory from? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-14 19:33 ` Christoph Lameter @ 2006-02-15 5:46 ` Bharata B Rao 2006-02-15 10:38 ` Bharata B Rao 0 siblings, 1 reply; 29+ messages in thread From: Bharata B Rao @ 2006-02-15 5:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, Ray Bryant, discuss, linux-kernel On Tue, Feb 14, 2006 at 11:33:00AM -0800, Christoph Lameter wrote: > I just took another look at this issue and I cannot see anything wrong. An > empty zone should be ignored by the page allocator since nr_free == 0. My > patch should not be needed. There is a check for list_empty(&area->free_list) in __rmqueue(), which I think is one of the points in the page allocator where the emptiness of the free_area list is checked. The current zone(when the crash happens) bypasses this test leading to this crash. > > Could you get us the contents of the struct zone that the page allocator > is trying to get memory from? The zone looks like this: crash> p *(struct zone *)0xffff81000000e700 $1 = { free_pages = 0, pages_min = 0, pages_low = 0, pages_high = 0, lowmem_reserve = {0, 0, 0, 0}, pageset = {0xffff81000c013740, 0xffff81013fc42f40, 0xffffffff8071d600, 0xffffffff8071d680, 0xffffffff8071d700, 0xffffffff8071d780, 0xffffffff8071d800, 0xffffffff8071d880, 0xffffffff8071d900, 0xffffffff8071d980, 0xffffffff8071da00, 0xffffffff8071da80, 0xffffffff8071db00, 0xffffffff8071db80, 0xffffffff8071dc00, 0xffffffff8071dc80, 0xffffffff8071dd00, 0xffffffff8071dd80, 0xffffffff8071de00, 0xffffffff8071de80, 0xffffffff8071df00, 0xffffffff8071df80, 0xffffffff8071e000, 0xffffffff8071e080, 0xffffffff8071e100, 0xffffffff8071e180, 0xffffffff8071e200, 0xffffffff8071e280, 0xffffffff8071e300, 0xffffffff8071e380, 0xffffffff8071e400, 0xffffffff8071e480}, lock = { raw_lock = { slock = 0 }, break_lock = 1 }, free_area = {{ free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }, { free_list = { next = 0x0, prev = 0x0 }, nr_free = 0 }}, _pad1_ = { x = 0xffff81000000e980 "\001" }, lru_lock = { raw_lock = { slock = 1 }, break_lock = 0 }, active_list = { next = 0xffff81000000e988, prev = 0xffff81000000e988 }, inactive_list = { next = 0xffff81000000e998, prev = 0xffff81000000e998 }, nr_scan_active = 0, nr_scan_inactive = 0, nr_active = 0, nr_inactive = 0, pages_scanned = 0, all_unreclaimable = 0, reclaim_in_progress = { counter = 0 }, last_unsuccessful_zone_reclaim = 0, temp_priority = 12, prev_priority = 12, _pad2_ = { x = 0xffff81000000ea00 "" }, wait_table = 0x0, wait_table_size = 0, wait_table_bits = 0, zone_pgdat = 0xffff81000000e000, zone_mem_map = 0x0, zone_start_pfn = 0, spanned_pages = 0, present_pages = 0, name = 0xffffffff804a858c "Normal" } Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-15 5:46 ` Bharata B Rao @ 2006-02-15 10:38 ` Bharata B Rao 2006-02-15 11:21 ` Andi Kleen 2006-02-15 18:10 ` Christoph Lameter 0 siblings, 2 replies; 29+ messages in thread From: Bharata B Rao @ 2006-02-15 10:38 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andi Kleen, Ray Bryant, discuss, linux-kernel On Wed, Feb 15, 2006 at 11:16:20AM +0530, Bharata B Rao wrote: > On Tue, Feb 14, 2006 at 11:33:00AM -0800, Christoph Lameter wrote: > > I just took another look at this issue and I cannot see anything wrong. An > > empty zone should be ignored by the page allocator since nr_free == 0. My > > patch should not be needed. > > There is a check for list_empty(&area->free_list) in __rmqueue(), which > I think is one of the points in the page allocator where the emptiness of > the free_area list is checked. The current zone(when the crash happens) > bypasses this test leading to this crash. > We don't initialize the free_area list for all zones. Instead, free_area_init_core() does that only for zones which are non-empty. But in __rmqueue(), we depend on these free_area lists to be intialized correctly for all zones, which is not true in the present case we are discussing. I think we either need to initialize free_area lists for all zones or check for !zone->free_area->nr_free in __rmqueue(). Even with this, mbind still needs to be fixed. Even though it can't get a conforming zone in the node (MPOL_BIND case), right now, it goes ahead with the "bind"ing of the memory area. This causes the application to crash (assuming we have fixed the __rmqueue kernel crash) (Haven't yet figured our why exactly the application dies) Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-15 10:38 ` Bharata B Rao @ 2006-02-15 11:21 ` Andi Kleen 2006-02-15 18:14 ` Christoph Lameter 2006-02-16 5:18 ` Bharata B Rao 2006-02-15 18:10 ` Christoph Lameter 1 sibling, 2 replies; 29+ messages in thread From: Andi Kleen @ 2006-02-15 11:21 UTC (permalink / raw) To: bharata; +Cc: Christoph Lameter, Ray Bryant, discuss, linux-kernel On Wednesday 15 February 2006 11:38, Bharata B Rao wrote: > > Even with this, mbind still needs to be fixed. Even though it > can't get a conforming zone in the node (MPOL_BIND case), It should just use lower zones then (e.g. if no ZONE_NORMAL use ZONE_DMA32). yes that needs to be fixed. How about the appended patch? Does it fix the problem for you? -Andi Handle all and empty zones when setting up custom zonelists for mbind The memory allocator doesn't like empty zones (which have an uninitialized freelist), so a x86-64 system with a node fully in GFP_DMA32 only would crash on mbind. Fix that up by putting all possible zones as fallback into the zonelist and skipping the empty ones. In fact the code always enough allocated space for all zones, but only used it for the highest. This change just uses all the memory that was allocated before. This should work fine for now, but whoever implements node hot removal needs to fix this somewhere else too (or make sure zone datastructures by itself never go away, only their memory) Signed-off-by: Andi Kleen <ak@suse.de> Index: linux/mm/mempolicy.c =================================================================== --- linux.orig/mm/mempolicy.c +++ linux/mm/mempolicy.c @@ -132,19 +132,29 @@ static int mpol_check_policy(int mode, n } return nodes_subset(*nodes, node_online_map) ? 0 : -EINVAL; } + /* Generate a custom zonelist for the BIND policy. */ static struct zonelist *bind_zonelist(nodemask_t *nodes) { struct zonelist *zl; - int num, max, nd; + int num, max, nd, k; max = 1 + MAX_NR_ZONES * nodes_weight(*nodes); - zl = kmalloc(sizeof(void *) * max, GFP_KERNEL); + zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); if (!zl) return NULL; num = 0; - for_each_node_mask(nd, *nodes) - zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone]; + /* First put in the highest zones from all nodes, then all the next + lower zones etc. Avoid empty zones because the memory allocator + doesn't like them. If you implement node hot removal you + have to fix that. */ + for (k = policy_zone; k >= 0; k--) { + for_each_node_mask(nd, *nodes) { + struct zone *z = &NODE_DATA(nd)->node_zones[k]; + if (z->present_pages > 0) + zl->zones[num++] = z; + } + } zl->zones[num] = NULL; return zl; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-15 11:21 ` Andi Kleen @ 2006-02-15 18:14 ` Christoph Lameter 2006-02-16 5:18 ` Bharata B Rao 1 sibling, 0 replies; 29+ messages in thread From: Christoph Lameter @ 2006-02-15 18:14 UTC (permalink / raw) To: Andi Kleen; +Cc: bharata, Ray Bryant, discuss, linux-kernel On Wed, 15 Feb 2006, Andi Kleen wrote: > How about the appended patch? Does it fix the problem for you? I think we still need to address the issue of being able to crash the page allocator if an empty zone is in the zonelist. > This should work fine for now, but whoever implements node hot removal > needs to fix this somewhere else too (or make sure zone datastructures > by itself never go away, only their memory) Yup. Simply initializing the pcp structures with empty lists should suffice though. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-15 11:21 ` Andi Kleen 2006-02-15 18:14 ` Christoph Lameter @ 2006-02-16 5:18 ` Bharata B Rao 1 sibling, 0 replies; 29+ messages in thread From: Bharata B Rao @ 2006-02-16 5:18 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Lameter, Ray Bryant, discuss, linux-kernel On Wed, Feb 15, 2006 at 12:21:53PM +0100, Andi Kleen wrote: > On Wednesday 15 February 2006 11:38, Bharata B Rao wrote: > > > > > Even with this, mbind still needs to be fixed. Even though it > > can't get a conforming zone in the node (MPOL_BIND case), > > It should just use lower zones then (e.g. if no ZONE_NORMAL > use ZONE_DMA32). yes that needs to be fixed. > > How about the appended patch? Does it fix the problem for you? > Yes, this fixes the problem. The kernel and the application don't crash now with this patch. Regards, Bharata. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 2006-02-15 10:38 ` Bharata B Rao 2006-02-15 11:21 ` Andi Kleen @ 2006-02-15 18:10 ` Christoph Lameter 1 sibling, 0 replies; 29+ messages in thread From: Christoph Lameter @ 2006-02-15 18:10 UTC (permalink / raw) To: Bharata B Rao Cc: Christoph Lameter, Andi Kleen, Ray Bryant, discuss, linux-kernel On Wed, 15 Feb 2006, Bharata B Rao wrote: > We don't initialize the free_area list for all zones. Instead, > free_area_init_core() does that only for zones which are non-empty. Right. > But in __rmqueue(), we depend on these free_area lists to be intialized > correctly for all zones, which is not true in the present case we > are discussing. > I think we either need to initialize free_area lists for all zones > or check for !zone->free_area->nr_free in __rmqueue(). Or we can initialize all pcp to contain empty lists for zones without pages. > Even with this, mbind still needs to be fixed. Even though it > can't get a conforming zone in the node (MPOL_BIND case), right now, > it goes ahead with the "bind"ing of the memory area. This causes the > application to crash (assuming we have fixed the __rmqueue kernel crash) > (Haven't yet figured our why exactly the application dies) The application crashes because of an OOM. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2006-02-16 5:14 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060205163618.GB21972@in.ibm.com>
2006-02-05 17:03 ` [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64 Andi Kleen
2006-02-06 16:11 ` Christoph Lameter
2006-02-06 18:12 ` Andi Kleen
2006-02-06 18:25 ` Christoph Lameter
2006-02-06 18:31 ` Andi Kleen
2006-02-06 18:45 ` Christoph Lameter
2006-02-06 18:55 ` Andi Kleen
2006-02-06 19:22 ` Christoph Lameter
2006-02-07 5:59 ` Bharata B Rao
2006-02-07 16:49 ` Christoph Lameter
2006-02-07 23:27 ` Ray Bryant
2006-02-07 23:36 ` Andi Kleen
2006-02-08 12:10 ` Bharata B Rao
2006-02-08 15:42 ` Christoph Lameter
2006-02-08 15:45 ` Andi Kleen
2006-02-08 15:59 ` Christoph Lameter
2006-02-08 16:06 ` Andi Kleen
2006-02-08 16:20 ` Christoph Lameter
2006-02-08 16:27 ` Andi Kleen
2006-02-08 16:51 ` Christoph Lameter
2006-02-09 4:39 ` Bharata B Rao
2006-02-09 9:58 ` Andi Kleen
2006-02-14 19:33 ` Christoph Lameter
2006-02-15 5:46 ` Bharata B Rao
2006-02-15 10:38 ` Bharata B Rao
2006-02-15 11:21 ` Andi Kleen
2006-02-15 18:14 ` Christoph Lameter
2006-02-16 5:18 ` Bharata B Rao
2006-02-15 18:10 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox