* [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
@ 2007-02-08 2:06 KAMEZAWA Hiroyuki
2007-02-08 7:49 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-08 2:06 UTC (permalink / raw)
To: LKML; +Cc: clameter, Andrew Morton, Andi Kleen, GOTO
Hi, This is much easier fix than previous one...
--
following is back trace of NULL pointer access in slab_node().
This patch fix this.
== backtrace from crash (linux-2.6.20) ==
#0 [BSP:e000000121f412d8] schedule at a00000010061ccc0
#1 [BSP:e000000121f41280] rwsem_down_failed_common at a000000100290490
#2 [BSP:e000000121f41260] rwsem_down_read_failed at a000000100620d30
#3 [BSP:e000000121f41240] down_read at a0000001000b01a0
#4 [BSP:e000000121f411e8] ia64_do_page_fault at a000000100625710
#5 [BSP:e000000121f411e8] ia64_leave_kernel at a00000010000c660
EFRAME: e000000121f47100
B0: a00000010013cc40 CR_IIP: a00000010012aa30
CR_IPSR: 0000101008022018 CR_IFS: 8000000000000205
AR_PFS: 0000000000000309 AR_RSC: 0000000000000003
AR_UNAT: 0000000000000000 AR_RNAT: 0000000000000000
AR_CCV: 0000000000000000 AR_FPSR: 0009804c8a70033f
LOADRS: 0000000000000000 AR_BSPSTORE: 0000000000000000
B6: a00000010003f040 B7: a00000010000ccd0
PR: 000000000055a9a5 R1: a000000100d5a5b0
R2: e00000010c50df7c R3: 0000000000000030
R8: 0000000000000000 R9: e00000011dc52930
R10: e00000011dc52928 R11: e00000010c50df80
R12: e000000121f472c0 R13: e000000121f40000
R14: 0000000000000002 R15: 000000003fffff00
R16: 0000000010400000 R17: e000000121f40000
R18: a000000100b5a9d0 R19: e000000121f40018
R20: e000000121f40c84 R21: 0000000000000000
R22: e000000121f47330 R23: e000000121f47334
R24: e000000121f40b88 R25: e000000121f47340
R26: e000000121f47334 R27: 0000000000000000
R28: 0000000000000000 R29: e000000121f47338
R30: 000000007fffffff R31: a000000100b5b5e0
F6: 1003eccd55056199632ec F7: 1003e9e3779b97f4a7c16
F8: 1003e0a00000010001422 F9: 1003e000000000fa00000
F10: 1003e000000003b9aca00 F11: 1003e431bde82d7b634db
#6 [BSP:e000000121f411c0] slab_node at a00000010012aa30
#7 [BSP:e000000121f41190] alternate_node_alloc at a00000010013cc40
#8 [BSP:e000000121f41160] kmem_cache_alloc at a00000010013dc40
#9 [BSP:e000000121f41100] desc_prologue at a00000010003ee00
#10 [BSP:e000000121f410c0] unw_decode_r2 at a00000010003f0c0
#11 [BSP:e000000121f41068] find_save_locs at a00000010003fbf0
#12 [BSP:e000000121f41038] unw_init_frame_info at a000000100040900
#13 [BSP:e000000121f41010] unw_init_running at a00000010000ccf0
==
This panic(hang) was found by a numa test-set on a system with 3 nodes, where
node(2) was memory-less-node.
This patch fixes zero-length zonelist problem in MPOL_MBIND.
If the length of zonelist is zero, just returns -EINVAL.
Changelog: v1 -> v2
- avoid extra pgdat scanning....it is not necessary.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.20/mm/mempolicy.c
===================================================================
--- linux-2.6.20.orig/mm/mempolicy.c 2007-02-08 09:50:45.000000000 +0900
+++ linux-2.6.20/mm/mempolicy.c 2007-02-08 10:35:53.000000000 +0900
@@ -144,7 +144,7 @@
max++; /* space for zlcache_ptr (see mmzone.h) */
zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
if (!zl)
- return NULL;
+ return PTR_ERR(-ENOMEM);
zl->zlcache_ptr = NULL;
num = 0;
/* First put in the highest zones from all nodes, then all the next
@@ -162,6 +162,10 @@
break;
k--;
}
+ if (!num) {
+ kfree(zl);
+ return PTR_ERR(-EINVAL);
+ }
zl->zones[num] = NULL;
return zl;
}
@@ -170,6 +174,7 @@
static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
{
struct mempolicy *policy;
+ void *val;
PDprintk("setting mode %d nodes[0] %lx\n", mode, nodes_addr(*nodes)[0]);
if (mode == MPOL_DEFAULT)
@@ -192,11 +197,12 @@
policy->v.preferred_node = -1;
break;
case MPOL_BIND:
- policy->v.zonelist = bind_zonelist(nodes);
- if (policy->v.zonelist == NULL) {
+ val = bind_zonelist(nodes);
+ if (IS_ERR(val)) {
kmem_cache_free(policy_cache, policy);
- return ERR_PTR(-ENOMEM);
+ return val;
}
+ policy->v.zonelist = val;
break;
}
policy->policy = mode;
@@ -1662,12 +1668,12 @@
zonelist = bind_zonelist(&nodes);
- /* If no mem, then zonelist is NULL and we keep old zonelist.
+ /* If no mem, then zonelist is ERR_PTR and we keep old zonelist.
* If that old zonelist has no remaining mems_allowed nodes,
* then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT.
*/
- if (zonelist) {
+ if (!IS_ERR(zonelist)) {
/* Good - got mem - substitute new zonelist */
kfree(pol->v.zonelist);
pol->v.zonelist = zonelist;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 2:06 [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2 KAMEZAWA Hiroyuki
@ 2007-02-08 7:49 ` Andi Kleen
2007-02-08 8:00 ` Andrew Morton
2007-02-08 8:16 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2007-02-08 7:49 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: LKML, clameter, Andrew Morton, GOTO
> This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> node(2) was memory-less-node.
I still think it's the wrong fix -- just get rid of the memory less node.
I expect you'll likely run into more problems with that setup anyways.
> static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
> {
> struct mempolicy *policy;
> + void *val;
Using void * here is nasty when it's a zonelist pointer.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 7:49 ` Andi Kleen
@ 2007-02-08 8:00 ` Andrew Morton
2007-02-08 8:03 ` Andi Kleen
2007-02-08 8:16 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 8:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO
On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
>
> > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > node(2) was memory-less-node.
>
> I still think it's the wrong fix -- just get rid of the memory less node.
"Let's break it even more"?
> I expect you'll likely run into more problems with that setup anyways.
What happens if he doesn't run into more problems?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 8:00 ` Andrew Morton
@ 2007-02-08 8:03 ` Andi Kleen
2007-02-08 8:08 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-02-08 8:03 UTC (permalink / raw)
To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO
On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
>
> >
> > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > node(2) was memory-less-node.
> >
> > I still think it's the wrong fix -- just get rid of the memory less node.
>
> "Let's break it even more"?
I still don't get what you believe what would be broken then.
> > I expect you'll likely run into more problems with that setup anyways.
>
> What happens if he doesn't run into more problems?
Then he's lucky. I ran into problems at least when I still had the empty
nodes some time ago on x86-64. Christoph said SN2 is doing the same.
iirc slab blew up at least, but that might be fixed by now. But it's a little risky
because there is more code now that is node aware.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 8:03 ` Andi Kleen
@ 2007-02-08 8:08 ` Andrew Morton
2007-02-08 8:19 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 8:08 UTC (permalink / raw)
To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO
On Thu, 8 Feb 2007 09:03:46 +0100 Andi Kleen <ak@suse.de> wrote:
> On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> > On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
> >
> > >
> > > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > > node(2) was memory-less-node.
> > >
> > > I still think it's the wrong fix -- just get rid of the memory less node.
> >
> > "Let's break it even more"?
>
> I still don't get what you believe what would be broken then.
A node with no memory is physical reality. The kernel should do its best
handle and report it accurately. Pretending that the CPUs on that node are
local to a different node's memory (as I understand your proposal) goes
against that.
> > > I expect you'll likely run into more problems with that setup anyways.
> >
> > What happens if he doesn't run into more problems?
>
> Then he's lucky. I ran into problems at least when I still had the empty
> nodes some time ago on x86-64. Christoph said SN2 is doing the same.
>
> iirc slab blew up at least, but that might be fixed by now. But it's a little risky
> because there is more code now that is node aware.
>
Well... I'd suggest that we try to struggle on, get it working. Is there
a downside to doing that?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 7:49 ` Andi Kleen
2007-02-08 8:00 ` Andrew Morton
@ 2007-02-08 8:16 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-02-08 8:16 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, clameter, akpm, y-goto
On Thu, 8 Feb 2007 08:49:41 +0100
Andi Kleen <ak@suse.de> wrote:
>
> > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > node(2) was memory-less-node.
>
> I still think it's the wrong fix -- just get rid of the memory less node.
> I expect you'll likely run into more problems with that setup anyways.
>
memory-less-node problem is a bit big issue for this small bug.
For removing memory-less-node, we'll have to make consensus on
"a node represents a group of memory, not cpu, not devices"
I read old e-mails on node-hotplug again.
In the early stage of node-hotplug, I assumes "a node has memory".
After discussion between several groups, I knew there are people
who consider groups of something as node.
(and we had to change our plan......)
like this (old) homepage's FAQ 5. http://lse.sourceforge.net/numa/faq/
> > static struct mempolicy *mpol_new(int mode, nodemask_t *nodes)
> > {
> > struct mempolicy *policy;
> > + void *val;
>
> Using void * here is nasty when it's a zonelist pointer.
>
okay. changes this to be more neat one..take3 will come.
-Kame
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 8:08 ` Andrew Morton
@ 2007-02-08 8:19 ` Andi Kleen
2007-02-08 8:24 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2007-02-08 8:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO
On Thursday 08 February 2007 09:08, Andrew Morton wrote:
> On Thu, 8 Feb 2007 09:03:46 +0100 Andi Kleen <ak@suse.de> wrote:
>
> > On Thursday 08 February 2007 09:00, Andrew Morton wrote:
> > > On Thu, 8 Feb 2007 08:49:41 +0100 Andi Kleen <ak@suse.de> wrote:
> > >
> > > >
> > > > > This panic(hang) was found by a numa test-set on a system with 3 nodes, where
> > > > > node(2) was memory-less-node.
> > > >
> > > > I still think it's the wrong fix -- just get rid of the memory less node.
> > >
> > > "Let's break it even more"?
> >
> > I still don't get what you believe what would be broken then.
>
> A node with no memory is physical reality. The kernel should do its best
> handle and report it accurately.
What would be the advantage of that?
The reason we present nodes to user space is that we can tell the user
where the memory is. You seem to try to promote it to some abstract entity
beyond that, but that doesn't seem particularly fruitful to me. I think
I prefer "down to earth" memory nodes.
> Pretending that the CPUs on that node are
> local to a different node's memory (as I understand your proposal) goes
> against that.
Well it is then most local to that node's memory.
>
> > > > I expect you'll likely run into more problems with that setup anyways.
> > >
> > > What happens if he doesn't run into more problems?
> >
> > Then he's lucky. I ran into problems at least when I still had the empty
> > nodes some time ago on x86-64. Christoph said SN2 is doing the same.
> >
> > iirc slab blew up at least, but that might be fixed by now. But it's a little risky
> > because there is more code now that is node aware.
> >
>
> Well... I'd suggest that we try to struggle on, get it working. Is there
> a downside to doing that?
The main problem I used to have was regressions. e.g. having memory
less nodes in different combinations would break in new releases.
(on x86-64 this depends on how the DIMMs are distributed so sometimes
breakage wasn't noticed for a long time. We ended up trying all combinations
on a simulator in the end). Assigning to nearby nodes avoided that special
case.
Ok there is still one other case -- nodes that have memory, but not enough
to do anything useful. This can happen too, but is still somewhat different.
Basically all kmalloc_node() need to fail gratiously.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2
2007-02-08 8:19 ` Andi Kleen
@ 2007-02-08 8:24 ` Andrew Morton
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 8:24 UTC (permalink / raw)
To: Andi Kleen; +Cc: KAMEZAWA Hiroyuki, LKML, clameter, GOTO
On Thu, 8 Feb 2007 09:19:16 +0100 Andi Kleen <ak@suse.de> wrote:
> The reason we present nodes to user space is that we can tell the user
> where the memory is. You seem to try to promote it to some abstract entity
> beyond that, but that doesn't seem particularly fruitful to me. I think
> I prefer "down to earth" memory nodes.
Who said a node is all about memory?
A node is (often) a circuit board, with an edge connector, containing some,
all or even none of a) CPUs, b) memory and c) IO devices.
That is how a kernel should model and treat it, surely? That's reality.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-02-08 8:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-08 2:06 [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take2 KAMEZAWA Hiroyuki
2007-02-08 7:49 ` Andi Kleen
2007-02-08 8:00 ` Andrew Morton
2007-02-08 8:03 ` Andi Kleen
2007-02-08 8:08 ` Andrew Morton
2007-02-08 8:19 ` Andi Kleen
2007-02-08 8:24 ` Andrew Morton
2007-02-08 8:16 ` KAMEZAWA Hiroyuki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.