* [PATCH] Add nid sanity on alloc_pages_node @ 2007-07-13 2:45 Joe Jin 2007-07-13 5:18 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-13 2:45 UTC (permalink / raw) To: akpm, bill.irwin; +Cc: linux-kernel This patch add nid sanity check on alloc_pages_node(). While two process change nr_hugepages at a system, alloc_fresh_huge_page() been called, at this function, nid defined as a static variable, but, there is not any protection of, if 2 process called at the same time, maybe pass a invalid nid to alloc_pages_node. We have hit it by following scripts: #!/bin/bash while : ; do echo 1000000000000 > /proc/sys/vm/nr_hugepages echo 1 > /proc/sys/vm/nr_hugepages echo 10000000000000000000 > /proc/sys/vm/nr_hugepages echo 0 > /proc/sys/vm/nr_hugepages done Run the script at _two_ difference terminal, after a short time, a kernel panic info will print. Signed-off-by: Joe Jin <joe.jin@oracle.com> --- --- linux-2.6.22/include/linux/gfp.h.orig 2007-07-12 15:06:23.000000000 +0800 +++ linux-2.6.22/include/linux/gfp.h 2007-07-12 15:02:59.000000000 +0800 @@ -133,6 +133,9 @@ /* Unknown node is current node */ if (nid < 0) nid = numa_node_id(); + + if (unlikely(nid == MAX_NUMNODES)) + nid = first_node(node_online_map); return __alloc_pages(gfp_mask, order, NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 2:45 [PATCH] Add nid sanity on alloc_pages_node Joe Jin @ 2007-07-13 5:18 ` Andrew Morton 2007-07-13 6:40 ` Joe Jin 2007-07-17 15:04 ` Hugh Dickins 0 siblings, 2 replies; 37+ messages in thread From: Andrew Morton @ 2007-07-13 5:18 UTC (permalink / raw) To: Joe Jin; +Cc: bill.irwin, linux-kernel On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: > This patch add nid sanity check on alloc_pages_node(). > While two process change nr_hugepages at a system, alloc_fresh_huge_page() > been called, at this function, nid defined as a static variable, but, there > is not any protection of, if 2 process called at the same time, maybe pass a > invalid nid to alloc_pages_node. > > We have hit it by following scripts: > > #!/bin/bash > while : ; do > echo 1000000000000 > /proc/sys/vm/nr_hugepages > echo 1 > /proc/sys/vm/nr_hugepages > echo 10000000000000000000 > /proc/sys/vm/nr_hugepages > echo 0 > /proc/sys/vm/nr_hugepages > done > > > Run the script at _two_ difference terminal, after a short time, a kernel panic > info will print. > > > Signed-off-by: Joe Jin <joe.jin@oracle.com> > --- > > --- linux-2.6.22/include/linux/gfp.h.orig 2007-07-12 15:06:23.000000000 +0800 > +++ linux-2.6.22/include/linux/gfp.h 2007-07-12 15:02:59.000000000 +0800 > @@ -133,6 +133,9 @@ > /* Unknown node is current node */ > if (nid < 0) > nid = numa_node_id(); > + > + if (unlikely(nid == MAX_NUMNODES)) > + nid = first_node(node_online_map); > > return __alloc_pages(gfp_mask, order, > NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); alloc_pages_node() is pretty much the last place where we want to fix this: it adds more cycles and more code to many important codepaths in the kernel. It'd be much better to fix the race within alloc_fresh_huge_page(). That function is pretty pathetic. Something like this? --- a/mm/hugetlb.c~a +++ a/mm/hugetlb.c @@ -105,13 +105,20 @@ static void free_huge_page(struct page * static int alloc_fresh_huge_page(void) { - static int nid = 0; + static int prev_nid; + static DEFINE_SPINLOCK(nid_lock); struct page *page; - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, - HUGETLB_PAGE_ORDER); - nid = next_node(nid, node_online_map); + int nid; + + spin_lock(&nid_lock); + nid = next_node(prev_nid, node_online_map); if (nid == MAX_NUMNODES) nid = first_node(node_online_map); + prev_nid = nid; + spin_unlock(&nid_lock); + + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, + HUGETLB_PAGE_ORDER); if (page) { set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); _ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 5:18 ` Andrew Morton @ 2007-07-13 6:40 ` Joe Jin 2007-07-13 6:49 ` Andrew Morton 2007-07-17 15:04 ` Hugh Dickins 1 sibling, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-13 6:40 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai On 2007-07-12 22:18, Andrew Morton wrote: > On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: > > > This patch add nid sanity check on alloc_pages_node(). > > While two process change nr_hugepages at a system, alloc_fresh_huge_page() > > been called, at this function, nid defined as a static variable, but, there > > is not any protection of, if 2 process called at the same time, maybe pass a > > invalid nid to alloc_pages_node. > > > > We have hit it by following scripts: > > > > #!/bin/bash > > while : ; do > > echo 1000000000000 > /proc/sys/vm/nr_hugepages > > echo 1 > /proc/sys/vm/nr_hugepages > > echo 10000000000000000000 > /proc/sys/vm/nr_hugepages > > echo 0 > /proc/sys/vm/nr_hugepages > > done > > > > > > Run the script at _two_ difference terminal, after a short time, a kernel panic > > info will print. > > > > > > Signed-off-by: Joe Jin <joe.jin@oracle.com> > > --- > > > > --- linux-2.6.22/include/linux/gfp.h.orig 2007-07-12 15:06:23.000000000 +0800 > > +++ linux-2.6.22/include/linux/gfp.h 2007-07-12 15:02:59.000000000 +0800 > > @@ -133,6 +133,9 @@ > > /* Unknown node is current node */ > > if (nid < 0) > > nid = numa_node_id(); > > + > > + if (unlikely(nid == MAX_NUMNODES)) > > + nid = first_node(node_online_map); > > > > return __alloc_pages(gfp_mask, order, > > NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); > > alloc_pages_node() is pretty much the last place where we want to fix this: > it adds more cycles and more code to many important codepaths in the > kernel. > > It'd be much better to fix the race within alloc_fresh_huge_page(). That > function is pretty pathetic. > > Something like this? > > --- a/mm/hugetlb.c~a > +++ a/mm/hugetlb.c > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > static int alloc_fresh_huge_page(void) > { > - static int nid = 0; > + static int prev_nid; > + static DEFINE_SPINLOCK(nid_lock); > struct page *page; > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > - HUGETLB_PAGE_ORDER); > - nid = next_node(nid, node_online_map); > + int nid; > + > + spin_lock(&nid_lock); > + nid = next_node(prev_nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > + prev_nid = nid; > + spin_unlock(&nid_lock); > + > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > + HUGETLB_PAGE_ORDER); > if (page) { > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > _ > The patch looks good for this bug, thanks :) if other caller give a invalid nid to alloc_pages_node(), __alloc_pages will crash again. So I think we add some sanity check for nid at alloc_pages_node is meaningful. another question, if (nid >= MAX_NUMNODES), may I set nid to 0 directly like following code? if (unlikly(nid >= MAX_NUMNODES) nid = 0 Thanks, Joe ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:40 ` Joe Jin @ 2007-07-13 6:49 ` Andrew Morton 2007-07-13 6:57 ` Andrew Morton ` (3 more replies) 0 siblings, 4 replies; 37+ messages in thread From: Andrew Morton @ 2007-07-13 6:49 UTC (permalink / raw) To: Joe Jin; +Cc: bill.irwin, linux-kernel, gurudas.pai On Fri, 13 Jul 2007 14:40:04 +0800 Joe Jin <joe.jin@oracle.com> wrote: > On 2007-07-12 22:18, Andrew Morton wrote: > > On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: > > > > Something like this? > > > > --- a/mm/hugetlb.c~a > > +++ a/mm/hugetlb.c > > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > > > static int alloc_fresh_huge_page(void) > > { > > - static int nid = 0; > > + static int prev_nid; > > + static DEFINE_SPINLOCK(nid_lock); > > struct page *page; > > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > - HUGETLB_PAGE_ORDER); > > - nid = next_node(nid, node_online_map); > > + int nid; > > + > > + spin_lock(&nid_lock); > > + nid = next_node(prev_nid, node_online_map); > > if (nid == MAX_NUMNODES) > > nid = first_node(node_online_map); > > + prev_nid = nid; > > + spin_unlock(&nid_lock); > > + > > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > + HUGETLB_PAGE_ORDER); > > if (page) { > > set_compound_page_dtor(page, free_huge_page); > > spin_lock(&hugetlb_lock); > > _ > > > > The patch looks good for this bug, thanks :) If you have time could you test it and sent it back at me please? > if other caller give a invalid nid to alloc_pages_node(), __alloc_pages > will crash again. That would be a buggy caller, so we should fix that caller. > So I think we add some sanity check for nid at alloc_pages_node is > meaningful. > > another question, if (nid >= MAX_NUMNODES), may I set nid to 0 directly > like following code? > > if (unlikly(nid >= MAX_NUMNODES) > nid = 0 if (nid > MAX_NUMNODES) then that is a bug and we should report it (doing this via a BUG() is OK) rather than quietly covering it up. if (nid == MAX_NUMNODES) then we should set it to first_node(node_online_map); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:49 ` Andrew Morton @ 2007-07-13 6:57 ` Andrew Morton 2007-07-13 8:29 ` Paul Jackson 2007-07-13 8:03 ` Joe Jin ` (2 subsequent siblings) 3 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-13 6:57 UTC (permalink / raw) To: Joe Jin, bill.irwin, linux-kernel, gurudas.pai; +Cc: Paul Jackson On Thu, 12 Jul 2007 23:49:38 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > first_node(node_online_map); Incidentally, we have a helper for this operation: #if MAX_NUMNODES > 1 ... #define first_online_node first_node(node_online_map) ... #else ... #define first_online_node 0 ... #endif So I suppose we should use that - it should generate better code for non-NUMA builds. It's a pretty sadly implemented helper though. It should be "first_online_node()", not "first_online_node". I'm scratching my head over that min_t in __first_node(), too. I don't think it's possible for find_first_bit(..., N) to return anything >N _anyway_. And if it does, we want to know about it. <looks at Paul> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:57 ` Andrew Morton @ 2007-07-13 8:29 ` Paul Jackson 2007-07-13 8:38 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Paul Jackson @ 2007-07-13 8:29 UTC (permalink / raw) To: Andrew Morton Cc: joe.jin, bill.irwin, linux-kernel, gurudas.pai, Zwane Mwaikambo > I'm scratching my head over that min_t in __first_node(), too. I don't think > it's possible for find_first_bit(..., N) to return anything >N _anyway_. And if > it does, we want to know about it. > > <looks at Paul> I'm not sure I've got this right, but looks like that min_t went in after Zwane Mwaikambo, then <zwane@fsmlabs.com>, whom I am presuming is the same person as now at <zwane@arm.linux.org.uk>, found a problem with the i386 find_next_bit implementation returning > N when merging i386 cpu hotplug. See the thread: http://lkml.org/lkml/2004/7/31/102 [PATCH][2.6] first/next_cpu returns values > NR_CPUS I apparently lobbied at the time to mandate that find_first_bit(..., N) return exactly N on failure to find a set bit, but gave up, after some confusions on my part. Adding Zwane to this thread -- the other participant on that thread, Bill Irwin, is already on the CC list. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:29 ` Paul Jackson @ 2007-07-13 8:38 ` Andrew Morton 2007-07-13 8:43 ` Paul Jackson 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-13 8:38 UTC (permalink / raw) To: Paul Jackson Cc: joe.jin, bill.irwin, linux-kernel, gurudas.pai, Zwane Mwaikambo On Fri, 13 Jul 2007 01:29:06 -0700 Paul Jackson <pj@sgi.com> wrote: > > I'm scratching my head over that min_t in __first_node(), too. I don't think > > it's possible for find_first_bit(..., N) to return anything >N _anyway_. And if > > it does, we want to know about it. > > > > <looks at Paul> > > I'm not sure I've got this right, but looks like that min_t went in after > Zwane Mwaikambo, then <zwane@fsmlabs.com>, whom I am presuming is the same > person as now at <zwane@arm.linux.org.uk>, found a problem with the i386 > find_next_bit implementation returning > N when merging i386 cpu hotplug. Ah, Zwane was involved - say no more ;) > See the thread: > > http://lkml.org/lkml/2004/7/31/102 > [PATCH][2.6] first/next_cpu returns values > NR_CPUS > > I apparently lobbied at the time to mandate that find_first_bit(..., N) > return exactly N on failure to find a set bit, but gave up, after some > confusions on my part. iirc, find_first_bit(..., N) _must_ return N on nothing-found. It'd be untidy to return some randomly-larger number. I wonder which was the culpable architecture? Oh, i386. Note how the i386 implementation's documentation carefully avoids describing the return value. I don't think _any_ of our find_foo_bit() implementations have return-value docs, and here we see the result. Sigh. What crap. I guess we leave it as-is. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:38 ` Andrew Morton @ 2007-07-13 8:43 ` Paul Jackson 2007-07-13 8:49 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Paul Jackson @ 2007-07-13 8:43 UTC (permalink / raw) To: Andrew Morton; +Cc: joe.jin, bill.irwin, linux-kernel, gurudas.pai, zwane > Sigh. What crap. I guess we leave it as-is. That matches my conclusion. The thought of exposing a bug in some user of this find bit code by tightening up the API to return == N, not >= N, scares poor little old me. I can imagine a nasty bug coming from this If someone bold enough and strong enough to slay such dragons came along, I'd gladly cheer them along. But such people are a rare breed, and probably have bigger dragons to slay first. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:43 ` Paul Jackson @ 2007-07-13 8:49 ` Andrew Morton 2007-07-13 8:54 ` Paul Jackson 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-13 8:49 UTC (permalink / raw) To: Paul Jackson; +Cc: joe.jin, bill.irwin, linux-kernel, gurudas.pai, zwane On Fri, 13 Jul 2007 01:43:49 -0700 Paul Jackson <pj@sgi.com> wrote: > > Sigh. What crap. I guess we leave it as-is. > > That matches my conclusion. > > The thought of exposing a bug in some user of this find bit code by > tightening up the API to return == N, not >= N, scares poor little old > me. I can imagine a nasty bug coming from this > > If someone bold enough and strong enough to slay such dragons came > along, I'd gladly cheer them along. But such people are a rare breed, > and probably have bigger dragons to slay first. > We could do this: add a little bitops torture-test and run that unconditionally, nice and early. Something for the young and keen amongst us... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:49 ` Andrew Morton @ 2007-07-13 8:54 ` Paul Jackson 2007-07-13 12:48 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 37+ messages in thread From: Paul Jackson @ 2007-07-13 8:54 UTC (permalink / raw) To: Andrew Morton; +Cc: joe.jin, bill.irwin, linux-kernel, gurudas.pai, zwane > Something for the young ... ... ancient self smiles with relief, glad that Andrew is not looking this way ;). > We could do this: add a little bitops torture-test and run that > unconditionally, nice and early. Good idea. That would allow for wide spread coverage across all arch's that they obeyed the rules for the tighter API - find_first_bit returning exactly N on finding no bits set, before we took the risk of actually relying on the changed API, by removing those min_t() guards. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:54 ` Paul Jackson @ 2007-07-13 12:48 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 37+ messages in thread From: Benjamin Herrenschmidt @ 2007-07-13 12:48 UTC (permalink / raw) To: Paul Jackson Cc: Andrew Morton, joe.jin, bill.irwin, linux-kernel, gurudas.pai, zwane On Fri, 2007-07-13 at 01:54 -0700, Paul Jackson wrote: > > Something for the young ... > > ... ancient self smiles with relief, glad that Andrew is not looking > this way ;). > > > We could do this: add a little bitops torture-test and run that > > unconditionally, nice and early. > > Good idea. That would allow for wide spread coverage across all arch's > that they obeyed the rules for the tighter API - find_first_bit returning > exactly N on finding no bits set, before we took the risk of actually > relying on the changed API, by removing those min_t() guards. That, the lockdep tests, the RCU tests... Looks like the kernel is building up it's own test suite :-) That's a -good- thing, maybe we should formalize it a bit (separate the "test suite" config options) ? Ben. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:49 ` Andrew Morton 2007-07-13 6:57 ` Andrew Morton @ 2007-07-13 8:03 ` Joe Jin 2007-07-13 8:15 ` Andrew Morton 2007-07-13 8:04 ` gurudas pai 2007-07-13 8:37 ` Joe Jin 3 siblings, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-13 8:03 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai > > > > The patch looks good for this bug, thanks :) > > If you have time could you test it and sent it back at me please? > Test passed and panic gone. thanks. > > if other caller give a invalid nid to alloc_pages_node(), __alloc_pages > > will crash again. > > That would be a buggy caller, so we should fix that caller. > > > So I think we add some sanity check for nid at alloc_pages_node is > > meaningful. > > > > another question, if (nid >= MAX_NUMNODES), may I set nid to 0 directly > > like following code? > > > > if (unlikly(nid >= MAX_NUMNODES) > > nid = 0 > > if (nid > MAX_NUMNODES) then that is a bug and we should report it (doing > this via a BUG() is OK) rather than quietly covering it up. > > if (nid == MAX_NUMNODES) then we should set it to > first_node(node_online_map); OK, will do it and send you a patch :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:03 ` Joe Jin @ 2007-07-13 8:15 ` Andrew Morton 2007-07-13 12:18 ` Joe Jin 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-13 8:15 UTC (permalink / raw) To: Joe Jin; +Cc: bill.irwin, linux-kernel, gurudas.pai, Paul Jackson On Fri, 13 Jul 2007 16:03:36 +0800 Joe Jin <joe.jin@oracle.com> wrote: > > > > > > The patch looks good for this bug, thanks :) > > > > If you have time could you test it and sent it back at me please? > > > > Test passed and panic gone. Patch gone too ;) I deleted it. I was hoping that you'd send me the final finished product (please). > thanks. > > > > if other caller give a invalid nid to alloc_pages_node(), __alloc_pages > > > will crash again. > > > > That would be a buggy caller, so we should fix that caller. > > > > > So I think we add some sanity check for nid at alloc_pages_node is > > > meaningful. > > > > > > another question, if (nid >= MAX_NUMNODES), may I set nid to 0 directly > > > like following code? > > > > > > if (unlikly(nid >= MAX_NUMNODES) > > > nid = 0 > > > > if (nid > MAX_NUMNODES) then that is a bug and we should report it (doing > > this via a BUG() is OK) rather than quietly covering it up. > > > > if (nid == MAX_NUMNODES) then we should set it to > > first_node(node_online_map); > > OK, will do it and send you a patch :) Thanks. Please cc Paul Jackson <pj@sgi.com> on that patch. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:15 ` Andrew Morton @ 2007-07-13 12:18 ` Joe Jin 2007-07-13 12:42 ` Paul Jackson 2007-07-14 17:40 ` Nish Aravamudan 0 siblings, 2 replies; 37+ messages in thread From: Joe Jin @ 2007-07-13 12:18 UTC (permalink / raw) To: Andrew Morton Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai, Paul Jackson > > Patch gone too ;) I deleted it. I was hoping that you'd send me the final > finished product (please). > Ha.., the patch against 2.6.22, at your patch have use htlb_alloc_mask, but I cannot found it at 2.6.22 kernel tree, I think you must use difference kernel tree :) Thanks, Joe --- linux-2.6.22/mm/hugetlb.c.orig 2007-07-12 15:02:19.000000000 +0800 +++ linux-2.6.22/mm/hugetlb.c 2007-07-13 17:33:45.000000000 +0800 @@ -101,13 +101,20 @@ static void free_huge_page(struct page * static int alloc_fresh_huge_page(void) { - static int nid = 0; + static int prev_nid; struct page *page; - page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, - HUGETLB_PAGE_ORDER); - nid = next_node(nid, node_online_map); + static DEFINE_SPINLOCK(nid_lock); + int nid; + + spin_lock(&nid_lock); + nid = next_node(prev_nid, node_online_map); if (nid == MAX_NUMNODES) nid = first_node(node_online_map); + prev_nid = nid; + spin_unlock(&nid_lock); + + page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, + HUGETLB_PAGE_ORDER); if (page) { set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 12:18 ` Joe Jin @ 2007-07-13 12:42 ` Paul Jackson 2007-07-14 17:40 ` Nish Aravamudan 1 sibling, 0 replies; 37+ messages in thread From: Paul Jackson @ 2007-07-13 12:42 UTC (permalink / raw) To: Joe Jin; +Cc: akpm, joe.jin, bill.irwin, linux-kernel, gurudas.pai Joe wrote: > --- linux-2.6.22/mm/hugetlb.c.orig 2007-07-12 15:02:19.000000000 +0800 > +++ linux-2.6.22/mm/hugetlb.c 2007-07-13 17:33:45.000000000 +0800 > @@ -101,13 +101,20 @@ static void free_huge_page(struct page * > > static int alloc_fresh_huge_page(void) > { > ... Patch looks ok to me, except for ... > Ha.., the patch against 2.6.22, at your patch have use htlb_alloc_mask, but I > cannot found it at 2.6.22 kernel tree, I think you must use difference kernel > tree :) Yes, as a matter of fact, he does have a different kernel tree, found at: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ This is the infamous *-mm tree, which is currently at the following version 2.6.22-rc6-mm1, and also available as a quilt patch set, beneath: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/ My guess is Andrew will prefer a patch against this *-mm kernel, though sometimes he is a good chap and fixes up minor differences, such as this involving htlb_alloc_mask replacing GFP_HIGHUSER. If you had the chance to verify your testing of this change against this 2.6.22-rc6-mm1, that would be worth doing. Probably the change to htlb_alloc_mask from GFP_HIGHUSER won't be any problem, but I suppose that there's a small chance that some other patch in *-mm will interact poorly with your proposed change. Thanks for getting on this so quickly. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 12:18 ` Joe Jin 2007-07-13 12:42 ` Paul Jackson @ 2007-07-14 17:40 ` Nish Aravamudan 2007-07-14 18:04 ` Andrew Morton 1 sibling, 1 reply; 37+ messages in thread From: Nish Aravamudan @ 2007-07-14 17:40 UTC (permalink / raw) To: Joe Jin; +Cc: Andrew Morton, bill.irwin, linux-kernel, gurudas.pai, Paul Jackson On 7/13/07, Joe Jin <joe.jin@oracle.com> wrote: > > > > Patch gone too ;) I deleted it. I was hoping that you'd send me the final > > finished product (please). > > > > Ha.., the patch against 2.6.22, at your patch have use htlb_alloc_mask, but I > cannot found it at 2.6.22 kernel tree, I think you must use difference kernel > tree :) I believe this patch will be unnecessary if my "Fix hugetlb pool allocation with empty nodes" patch gets pulled into -mm. alloc_fresh_huge_page() now takes a mempolicy which is created by the two callers, rather than reinventing interleaving itself. I believe this will avoid the oops you saw. I am still waiting on some test results (annoying -mm config changes) before I repost them (and they depend on Christoph's fixes for memoryless nodes). Thanks, Nish > --- linux-2.6.22/mm/hugetlb.c.orig 2007-07-12 15:02:19.000000000 +0800 > +++ linux-2.6.22/mm/hugetlb.c 2007-07-13 17:33:45.000000000 +0800 > @@ -101,13 +101,20 @@ static void free_huge_page(struct page * > > static int alloc_fresh_huge_page(void) > { > - static int nid = 0; > + static int prev_nid; > struct page *page; > - page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, > - HUGETLB_PAGE_ORDER); > - nid = next_node(nid, node_online_map); > + static DEFINE_SPINLOCK(nid_lock); > + int nid; > + > + spin_lock(&nid_lock); > + nid = next_node(prev_nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > + prev_nid = nid; > + spin_unlock(&nid_lock); > + > + page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, > + HUGETLB_PAGE_ORDER); > if (page) { > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-14 17:40 ` Nish Aravamudan @ 2007-07-14 18:04 ` Andrew Morton 2007-07-14 20:47 ` Nish Aravamudan 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-14 18:04 UTC (permalink / raw) To: Nish Aravamudan Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai, Paul Jackson On Sat, 14 Jul 2007 10:40:25 -0700 "Nish Aravamudan" <nish.aravamudan@gmail.com> wrote: > On 7/13/07, Joe Jin <joe.jin@oracle.com> wrote: > > > > > > Patch gone too ;) I deleted it. I was hoping that you'd send me the final > > > finished product (please). > > > > > > > Ha.., the patch against 2.6.22, at your patch have use htlb_alloc_mask, but I > > cannot found it at 2.6.22 kernel tree, I think you must use difference kernel > > tree :) > > I believe this patch will be unnecessary if my "Fix hugetlb pool > allocation with empty nodes" patch gets pulled into -mm. > alloc_fresh_huge_page() now takes a mempolicy which is created by the > two callers, rather than reinventing interleaving itself. I believe > this will avoid the oops you saw. I am still waiting on some test > results (annoying -mm config changes) before I repost them (and they > depend on Christoph's fixes for memoryless nodes). Could be so, but the patch which I have queued is nice and simple and can be backported into 2.6.22.x. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-14 18:04 ` Andrew Morton @ 2007-07-14 20:47 ` Nish Aravamudan 0 siblings, 0 replies; 37+ messages in thread From: Nish Aravamudan @ 2007-07-14 20:47 UTC (permalink / raw) To: Andrew Morton Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai, Paul Jackson On 7/14/07, Andrew Morton <akpm@linux-foundation.org> wrote: > On Sat, 14 Jul 2007 10:40:25 -0700 "Nish Aravamudan" <nish.aravamudan@gmail.com> wrote: > > > On 7/13/07, Joe Jin <joe.jin@oracle.com> wrote: > > > > > > > > Patch gone too ;) I deleted it. I was hoping that you'd send me the final > > > > finished product (please). > > > > > > > > > > Ha.., the patch against 2.6.22, at your patch have use htlb_alloc_mask, but I > > > cannot found it at 2.6.22 kernel tree, I think you must use difference kernel > > > tree :) > > > > I believe this patch will be unnecessary if my "Fix hugetlb pool > > allocation with empty nodes" patch gets pulled into -mm. > > alloc_fresh_huge_page() now takes a mempolicy which is created by the > > two callers, rather than reinventing interleaving itself. I believe > > this will avoid the oops you saw. I am still waiting on some test > > results (annoying -mm config changes) before I repost them (and they > > depend on Christoph's fixes for memoryless nodes). > > Could be so, but the patch which I have queued is nice and simple and > can be backported into 2.6.22.x. Fair enough. FWIW, just finished testing my patches and have posted the current incarnation to linux-mm. Thanks, Nish ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:49 ` Andrew Morton 2007-07-13 6:57 ` Andrew Morton 2007-07-13 8:03 ` Joe Jin @ 2007-07-13 8:04 ` gurudas pai 2007-07-13 8:19 ` Andrew Morton 2007-07-13 8:37 ` Joe Jin 3 siblings, 1 reply; 37+ messages in thread From: gurudas pai @ 2007-07-13 8:04 UTC (permalink / raw) To: akpm; +Cc: Joe Jin, bill.irwin, linux-kernel, GURUDAS,PAI Andrew Morton wrote: > On Fri, 13 Jul 2007 14:40:04 +0800 Joe Jin <joe.jin@oracle.com> wrote: > >> On 2007-07-12 22:18, Andrew Morton wrote: >>> On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: >>> >>> Something like this? >>> >>> --- a/mm/hugetlb.c~a >>> +++ a/mm/hugetlb.c >>> @@ -105,13 +105,20 @@ static void free_huge_page(struct page * >>> >>> static int alloc_fresh_huge_page(void) >>> { >>> - static int nid = 0; >>> + static int prev_nid; >>> + static DEFINE_SPINLOCK(nid_lock); >>> struct page *page; >>> - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, >>> - HUGETLB_PAGE_ORDER); >>> - nid = next_node(nid, node_online_map); >>> + int nid; >>> + >>> + spin_lock(&nid_lock); >>> + nid = next_node(prev_nid, node_online_map); >>> if (nid == MAX_NUMNODES) >>> nid = first_node(node_online_map); >>> + prev_nid = nid; >>> + spin_unlock(&nid_lock); >>> + >>> + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, >>> + HUGETLB_PAGE_ORDER); >>> if (page) { >>> set_compound_page_dtor(page, free_huge_page); >>> spin_lock(&hugetlb_lock); >>> _ >>> I think this will never get pages from node 0 ? Because nid = next_node(prev_node,node_online_map) and even if prev_node = 0, nid will become 1. How about this patch ? --- linux-2.6.22.orig/mm/hugetlb.c 2007-07-08 16:32:17.000000000 -0700 +++ linux-2.6.22-devel/mm//hugetlb.c 2007-07-13 00:26:27.000000000 -0700 @@ -103,11 +103,18 @@ { static int nid = 0; struct page *page; - page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, - HUGETLB_PAGE_ORDER); + int cur_nid; + static DEFINE_SPINLOCK(nid_lock); + + spin_lock(&nid_lock); + cur_nid = nid; nid = next_node(nid, node_online_map); if (nid == MAX_NUMNODES) nid = first_node(node_online_map); + spin_unlock(&nid_lock); + + page = alloc_pages_node(cur_nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, + HUGETLB_PAGE_ORDER); if (page) { set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); Thanks, -Guru ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:04 ` gurudas pai @ 2007-07-13 8:19 ` Andrew Morton 2007-07-13 12:37 ` gurudas pai 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-13 8:19 UTC (permalink / raw) To: gurudas pai; +Cc: Joe Jin, bill.irwin, linux-kernel On Fri, 13 Jul 2007 13:34:29 +0530 gurudas pai <gurudas.pai@oracle.com> wrote: > > > Andrew Morton wrote: > > On Fri, 13 Jul 2007 14:40:04 +0800 Joe Jin <joe.jin@oracle.com> wrote: > > > >> On 2007-07-12 22:18, Andrew Morton wrote: > >>> On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: > >>> > >>> Something like this? > >>> > >>> --- a/mm/hugetlb.c~a > >>> +++ a/mm/hugetlb.c > >>> @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > >>> > >>> static int alloc_fresh_huge_page(void) > >>> { > >>> - static int nid = 0; > >>> + static int prev_nid; > >>> + static DEFINE_SPINLOCK(nid_lock); > >>> struct page *page; > >>> - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > >>> - HUGETLB_PAGE_ORDER); > >>> - nid = next_node(nid, node_online_map); > >>> + int nid; > >>> + > >>> + spin_lock(&nid_lock); > >>> + nid = next_node(prev_nid, node_online_map); > >>> if (nid == MAX_NUMNODES) > >>> nid = first_node(node_online_map); > >>> + prev_nid = nid; > >>> + spin_unlock(&nid_lock); > >>> + > >>> + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > >>> + HUGETLB_PAGE_ORDER); > >>> if (page) { > >>> set_compound_page_dtor(page, free_huge_page); > >>> spin_lock(&hugetlb_lock); > >>> _ > >>> > I think this will never get pages from node 0 ? Because nid = > next_node(prev_node,node_online_map) and even if prev_node = 0, nid will > become 1. It'll start out at node 1. But it will visit the final node (which is less than MAX_NUMNODES) and will then advance onto the fist node (which can be >= 0). This code needs a bit of thought and testing for the non-numa case too please. At the least, there might be optimisation opportunities. > How about this patch ? > > > --- linux-2.6.22.orig/mm/hugetlb.c 2007-07-08 16:32:17.000000000 -0700 > +++ linux-2.6.22-devel/mm//hugetlb.c 2007-07-13 00:26:27.000000000 -0700 > @@ -103,11 +103,18 @@ > { > static int nid = 0; > struct page *page; > - page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, > - HUGETLB_PAGE_ORDER); > + int cur_nid; > + static DEFINE_SPINLOCK(nid_lock); > + > + spin_lock(&nid_lock); > + cur_nid = nid; > nid = next_node(nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > + spin_unlock(&nid_lock); > + > + page = alloc_pages_node(cur_nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN, > + HUGETLB_PAGE_ORDER); > if (page) { > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); whoa, your email client made a huge mess of that one. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:19 ` Andrew Morton @ 2007-07-13 12:37 ` gurudas pai 0 siblings, 0 replies; 37+ messages in thread From: gurudas pai @ 2007-07-13 12:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel >>>>> On Fri, 13 Jul 2007 10:45:07 +0800 Joe Jin <joe.jin@oracle.com> wrote: >>>>> >>>>> Something like this? >>>>> >>>>> --- a/mm/hugetlb.c~a >>>>> +++ a/mm/hugetlb.c >>>>> @@ -105,13 +105,20 @@ static void free_huge_page(struct page * >>>>> >>>>> static int alloc_fresh_huge_page(void) >>>>> { >>>>> - static int nid = 0; >>>>> + static int prev_nid; >>>>> + static DEFINE_SPINLOCK(nid_lock); >>>>> struct page *page; >>>>> - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, >>>>> - HUGETLB_PAGE_ORDER); >>>>> - nid = next_node(nid, node_online_map); >>>>> + int nid; >>>>> + >>>>> + spin_lock(&nid_lock); >>>>> + nid = next_node(prev_nid, node_online_map); >>>>> if (nid == MAX_NUMNODES) >>>>> nid = first_node(node_online_map); >>>>> + prev_nid = nid; >>>>> + spin_unlock(&nid_lock); >>>>> + >>>>> + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, >>>>> + HUGETLB_PAGE_ORDER); >>>>> if (page) { >>>>> set_compound_page_dtor(page, free_huge_page); >>>>> spin_lock(&hugetlb_lock); >>>>> _ >>>>> >> I think this will never get pages from node 0 ? Because nid = >> next_node(prev_node,node_online_map) and even if prev_node = 0, nid will >> become 1. > > It'll start out at node 1. But it will visit the final node (which is less > than MAX_NUMNODES) and will then advance onto the fist node (which can be >> = 0). > > This code needs a bit of thought and testing for the non-numa case too > please. At the least, there might be optimisation opportunities. I tested on non-numa machine. Your patch works fine. Thanks, -Guru ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 6:49 ` Andrew Morton ` (2 preceding siblings ...) 2007-07-13 8:04 ` gurudas pai @ 2007-07-13 8:37 ` Joe Jin 2007-07-13 8:44 ` Andrew Morton 3 siblings, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-13 8:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel, gurudas.pai > > if (nid > MAX_NUMNODES) then that is a bug and we should report it (doing > this via a BUG() is OK) rather than quietly covering it up. I have create a patch to check if nid > MAX_NUMNODES, please apply it thanks Signed-off-by: Joe Jin <joe.jin@oracle.com> --- --- linux-2.6.22/include/linux/gfp.h.orig 2007-07-12 15:06:23.000000000 +0800 +++ linux-2.6.22/include/linux/gfp.h 2007-07-13 16:23:52.000000000 +0800 @@ -127,6 +127,8 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { + BUG_ON(nid > MAX_NUMNODES); + if (unlikely(order >= MAX_ORDER)) return NULL; ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 8:37 ` Joe Jin @ 2007-07-13 8:44 ` Andrew Morton 0 siblings, 0 replies; 37+ messages in thread From: Andrew Morton @ 2007-07-13 8:44 UTC (permalink / raw) To: Joe Jin; +Cc: bill.irwin, linux-kernel, gurudas.pai On Fri, 13 Jul 2007 16:37:32 +0800 Joe Jin <joe.jin@oracle.com> wrote: > > > > if (nid > MAX_NUMNODES) then that is a bug and we should report it (doing > > this via a BUG() is OK) rather than quietly covering it up. > > I have create a patch to check if nid > MAX_NUMNODES, please apply it > thanks > > Signed-off-by: Joe Jin <joe.jin@oracle.com> > --- > > --- linux-2.6.22/include/linux/gfp.h.orig 2007-07-12 15:06:23.000000000 +0800 > +++ linux-2.6.22/include/linux/gfp.h 2007-07-13 16:23:52.000000000 +0800 > @@ -127,6 +127,8 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i > static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, > unsigned int order) > { > + BUG_ON(nid > MAX_NUMNODES); > + > if (unlikely(order >= MAX_ORDER)) > return NULL; > nope ;) Would really prefer not to go adding overhead like this into a frequently-called and frequently-inlined codepath. If we do have a bug in a caller then the code will go on to overindex NODE_DATA() which will hopefully produce a nice oops for at least some people, and that's good enough. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-13 5:18 ` Andrew Morton 2007-07-13 6:40 ` Joe Jin @ 2007-07-17 15:04 ` Hugh Dickins 2007-07-17 16:32 ` Andrew Morton 1 sibling, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2007-07-17 15:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel On Thu, 12 Jul 2007, Andrew Morton wrote: > > It'd be much better to fix the race within alloc_fresh_huge_page(). That > function is pretty pathetic. > > Something like this? > > --- a/mm/hugetlb.c~a > +++ a/mm/hugetlb.c > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > static int alloc_fresh_huge_page(void) > { > - static int nid = 0; > + static int prev_nid; > + static DEFINE_SPINLOCK(nid_lock); > struct page *page; > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > - HUGETLB_PAGE_ORDER); > - nid = next_node(nid, node_online_map); > + int nid; > + > + spin_lock(&nid_lock); > + nid = next_node(prev_nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > + prev_nid = nid; > + spin_unlock(&nid_lock); > + > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > + HUGETLB_PAGE_ORDER); > if (page) { > set_compound_page_dtor(page, free_huge_page); > spin_lock(&hugetlb_lock); Now that it's gone into the tree, I look at it and wonder, does your nid_lock really serve any purpose? We're just doing a simple assignment to prev_nid, and it doesn't matter if occasionally two racers choose the same node, and there's no protection here against a node being offlined before the alloc_pages_node anyway (unsupported? I'm ignorant). Hugh ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 15:04 ` Hugh Dickins @ 2007-07-17 16:32 ` Andrew Morton 2007-07-17 17:26 ` Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-17 16:32 UTC (permalink / raw) To: Hugh Dickins; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007 16:04:54 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > On Thu, 12 Jul 2007, Andrew Morton wrote: > > > > It'd be much better to fix the race within alloc_fresh_huge_page(). That > > function is pretty pathetic. > > > > Something like this? > > > > --- a/mm/hugetlb.c~a > > +++ a/mm/hugetlb.c > > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > > > static int alloc_fresh_huge_page(void) > > { > > - static int nid = 0; > > + static int prev_nid; > > + static DEFINE_SPINLOCK(nid_lock); > > struct page *page; > > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > - HUGETLB_PAGE_ORDER); > > - nid = next_node(nid, node_online_map); > > + int nid; > > + > > + spin_lock(&nid_lock); > > + nid = next_node(prev_nid, node_online_map); > > if (nid == MAX_NUMNODES) > > nid = first_node(node_online_map); > > + prev_nid = nid; > > + spin_unlock(&nid_lock); > > + > > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > + HUGETLB_PAGE_ORDER); > > if (page) { > > set_compound_page_dtor(page, free_huge_page); > > spin_lock(&hugetlb_lock); > > Now that it's gone into the tree, I look at it and wonder, does your > nid_lock really serve any purpose? We're just doing a simple assignment > to prev_nid, and it doesn't matter if occasionally two racers choose the > same node, and there's no protection here against a node being offlined > before the alloc_pages_node anyway (unsupported? I'm ignorant). umm, actually, yes, the code as it happens to be structured does mean that ther is no longer a way in which a race can cause us to pass MAX_NUMNODES into alloc_pages_node(). Or not. We can call next_node(MAX_NUMNODES, node_online_map) in that race window, with perhaps bad results. I think I like the lock ;) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 16:32 ` Andrew Morton @ 2007-07-17 17:26 ` Hugh Dickins 2007-07-17 18:58 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2007-07-17 17:26 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007, Andrew Morton wrote: > On Tue, 17 Jul 2007 16:04:54 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > > On Thu, 12 Jul 2007, Andrew Morton wrote: > > > > > > It'd be much better to fix the race within alloc_fresh_huge_page(). That > > > function is pretty pathetic. > > > > > > Something like this? > > > > > > --- a/mm/hugetlb.c~a > > > +++ a/mm/hugetlb.c > > > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > > > > > static int alloc_fresh_huge_page(void) > > > { > > > - static int nid = 0; > > > + static int prev_nid; > > > + static DEFINE_SPINLOCK(nid_lock); > > > struct page *page; > > > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > > - HUGETLB_PAGE_ORDER); > > > - nid = next_node(nid, node_online_map); > > > + int nid; > > > + > > > + spin_lock(&nid_lock); > > > + nid = next_node(prev_nid, node_online_map); > > > if (nid == MAX_NUMNODES) > > > nid = first_node(node_online_map); > > > + prev_nid = nid; > > > + spin_unlock(&nid_lock); > > > + > > > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > > + HUGETLB_PAGE_ORDER); > > > if (page) { > > > set_compound_page_dtor(page, free_huge_page); > > > spin_lock(&hugetlb_lock); > > > > Now that it's gone into the tree, I look at it and wonder, does your > > nid_lock really serve any purpose? We're just doing a simple assignment > > to prev_nid, and it doesn't matter if occasionally two racers choose the > > same node, and there's no protection here against a node being offlined > > before the alloc_pages_node anyway (unsupported? I'm ignorant). > > umm, actually, yes, the code as it happens to be structured does mean that > ther is no longer a way in which a race can cause us to pass MAX_NUMNODES > into alloc_pages_node(). > > Or not. We can call next_node(MAX_NUMNODES, node_online_map) in that race > window, with perhaps bad results. > > I think I like the lock ;) I hate to waste your time, but I'm still puzzled. Wasn't the race fixed by your changeover from use of "static int nid" throughout, to setting local "int nid" from "static int prev_nid", working with nid, then setting prev_nid from nid at the end? What does the lock add to that? Hugh ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 17:26 ` Hugh Dickins @ 2007-07-17 18:58 ` Andrew Morton 2007-07-17 19:49 ` Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-17 18:58 UTC (permalink / raw) To: Hugh Dickins; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007 18:26:14 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > On Tue, 17 Jul 2007, Andrew Morton wrote: > > On Tue, 17 Jul 2007 16:04:54 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > > > On Thu, 12 Jul 2007, Andrew Morton wrote: > > > > > > > > It'd be much better to fix the race within alloc_fresh_huge_page(). That > > > > function is pretty pathetic. > > > > > > > > Something like this? > > > > > > > > --- a/mm/hugetlb.c~a > > > > +++ a/mm/hugetlb.c > > > > @@ -105,13 +105,20 @@ static void free_huge_page(struct page * > > > > > > > > static int alloc_fresh_huge_page(void) > > > > { > > > > - static int nid = 0; > > > > + static int prev_nid; > > > > + static DEFINE_SPINLOCK(nid_lock); > > > > struct page *page; > > > > - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > > > - HUGETLB_PAGE_ORDER); > > > > - nid = next_node(nid, node_online_map); > > > > + int nid; > > > > + > > > > + spin_lock(&nid_lock); > > > > + nid = next_node(prev_nid, node_online_map); > > > > if (nid == MAX_NUMNODES) > > > > nid = first_node(node_online_map); > > > > + prev_nid = nid; > > > > + spin_unlock(&nid_lock); > > > > + > > > > + page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > > > + HUGETLB_PAGE_ORDER); > > > > if (page) { > > > > set_compound_page_dtor(page, free_huge_page); > > > > spin_lock(&hugetlb_lock); > > > > > > Now that it's gone into the tree, I look at it and wonder, does your > > > nid_lock really serve any purpose? We're just doing a simple assignment > > > to prev_nid, and it doesn't matter if occasionally two racers choose the > > > same node, and there's no protection here against a node being offlined > > > before the alloc_pages_node anyway (unsupported? I'm ignorant). > > > > umm, actually, yes, the code as it happens to be structured does mean that > > ther is no longer a way in which a race can cause us to pass MAX_NUMNODES > > into alloc_pages_node(). > > > > Or not. We can call next_node(MAX_NUMNODES, node_online_map) in that race > > window, with perhaps bad results. > > > > I think I like the lock ;) > > I hate to waste your time, but I'm still puzzled. Wasn't the race fixed > by your changeover from use of "static int nid" throughout, to setting > local "int nid" from "static int prev_nid", working with nid, then > setting prev_nid from nid at the end? What does the lock add to that? > There are still minor races without the lock - two CPUs will allocate from the first node, and prev_nid can occasionally go backwards. I agree that they are sufficiently minor that we could remove the lock. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 18:58 ` Andrew Morton @ 2007-07-17 19:49 ` Hugh Dickins 2007-07-17 20:01 ` Andrew Morton 0 siblings, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2007-07-17 19:49 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007, Andrew Morton wrote: > On Tue, 17 Jul 2007 18:26:14 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > > > I think I like the lock ;) > > > > I hate to waste your time, but I'm still puzzled. Wasn't the race fixed > > by your changeover from use of "static int nid" throughout, to setting > > local "int nid" from "static int prev_nid", working with nid, then > > setting prev_nid from nid at the end? What does the lock add to that? > > There are still minor races without the lock - two CPUs will allocate > from the first node, and prev_nid can occasionally go backwards. > > I agree that they are sufficiently minor that we could remove the lock. Phew, yes, that's what I meant - thanks. [PATCH] Remove nid_lock from alloc_fresh_huge_page The fix to the race in alloc_fresh_huge_page() which could give an illegal node ID did not need nid_lock at all: the fix was to replace static int nid by static int prev_nid and do the work on local int nid. nid_lock did make sure that racers strictly roundrobin the nodes, but that's not something we actually need to enforce strictly. Kill that nid_lock. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- mm/hugetlb.c | 3 --- 1 file changed, 3 deletions(-) --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100 +++ linux/mm/hugetlb.c 2007-07-17 20:32:51.000000000 +0100 @@ -107,15 +107,12 @@ static int alloc_fresh_huge_page(void) { static int prev_nid; struct page *page; - static DEFINE_SPINLOCK(nid_lock); int nid; - spin_lock(&nid_lock); nid = next_node(prev_nid, node_online_map); if (nid == MAX_NUMNODES) nid = first_node(node_online_map); prev_nid = nid; - spin_unlock(&nid_lock); page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, HUGETLB_PAGE_ORDER); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 19:49 ` Hugh Dickins @ 2007-07-17 20:01 ` Andrew Morton 2007-07-17 20:35 ` Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-17 20:01 UTC (permalink / raw) To: Hugh Dickins; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007 20:49:25 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100 > +++ linux/mm/hugetlb.c 2007-07-17 20:32:51.000000000 +0100 > @@ -107,15 +107,12 @@ static int alloc_fresh_huge_page(void) > { > static int prev_nid; > struct page *page; > - static DEFINE_SPINLOCK(nid_lock); > int nid; > > - spin_lock(&nid_lock); > nid = next_node(prev_nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > prev_nid = nid; > - spin_unlock(&nid_lock); Given that we've now gone and added deliberate-but-we-hope-benign races into this code, an elaborate comment which explains and justifies it all is pretty much obligatory, IMO. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 20:01 ` Andrew Morton @ 2007-07-17 20:35 ` Hugh Dickins 2007-07-18 1:40 ` Joe Jin 0 siblings, 1 reply; 37+ messages in thread From: Hugh Dickins @ 2007-07-17 20:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Jin, bill.irwin, linux-kernel On Tue, 17 Jul 2007, Andrew Morton wrote: > > Given that we've now gone and added deliberate-but-we-hope-benign > races into this code, an elaborate comment which explains and justifies > it all is pretty much obligatory, IMO. [PATCH] Remove nid_lock from alloc_fresh_huge_page The fix to that race in alloc_fresh_huge_page() which could give an illegal node ID did not need nid_lock at all: the fix was to replace static int nid by static int prev_nid and do the work on local int nid. nid_lock did make sure that racers strictly roundrobin the nodes, but that's not something we need to enforce strictly. Kill nid_lock. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- mm/hugetlb.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100 +++ linux/mm/hugetlb.c 2007-07-17 21:29:58.000000000 +0100 @@ -107,15 +107,19 @@ static int alloc_fresh_huge_page(void) { static int prev_nid; struct page *page; - static DEFINE_SPINLOCK(nid_lock); int nid; - spin_lock(&nid_lock); + /* + * Copy static prev_nid to local nid, work on that, then copy it + * back to prev_nid afterwards: otherwise there's a window in which + * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node. + * But we don't need to use a spin_lock here: it really doesn't + * matter if occasionally a racer chooses the same nid as we do. + */ nid = next_node(prev_nid, node_online_map); if (nid == MAX_NUMNODES) nid = first_node(node_online_map); prev_nid = nid; - spin_unlock(&nid_lock); page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, HUGETLB_PAGE_ORDER); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-17 20:35 ` Hugh Dickins @ 2007-07-18 1:40 ` Joe Jin 2007-07-18 4:49 ` Hugh Dickins 0 siblings, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-18 1:40 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Joe Jin, bill.irwin, linux-kernel With your patch, I have reproduced the panic: Unable to handle kernel paging request at 000000000000186a RIP: [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 PGD 72595067 PUD 72594067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand dm_mirror dm_multipath dm_mod video sbs button battery backlight ac snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sg snd_timer shpchp snd soundcore tg3 i2c_i801 piix i2c_core snd_page_alloc ide_cd cdrom serio_raw ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 3996, comm: sh Not tainted 2.6.22 #9 RIP: 0010:[<ffffffff8105d4e7>] [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 RSP: 0018:ffff810071563e48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000e8d4a51000 RCX: 0000000000000000 RDX: ffff810071563fd8 RSI: 0000000000000009 RDI: 00000000000242d2 RBP: 00000000000242d2 R08: 0000000000000000 R09: 0000000000043a01 R10: ffff810000e88000 R11: ffffffff81072a02 R12: 0000000000001862 R13: ffff810070c74ea0 R14: 0000000000000009 R15: 00002adc51042000 FS: 00002adc4db3ddb0(0000) GS:ffffffff81312000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000000186a CR3: 000000007d963000 CR4: 00000000000006e0 Process sh (pid: 3996, threadinfo ffff810071562000, task ffff810070c74ea0) Stack: 00000010000242d2 ffff810000e88000 ffff810070e76710 ffffffff812e31e0 ffffffffffffffff 000000e8d4a51000 ffffffffffffffff ffff810077d11b00 000000000000000e ffff810071563f50 00002adc51042000 ffffffff81071a67 Call Trace: [<ffffffff81071a67>] alloc_fresh_huge_page+0x98/0xe7 [<ffffffff8107263b>] hugetlb_sysctl_handler+0x14/0xf1 [<ffffffff810bdad9>] proc_sys_write+0x7c/0xa6 [<ffffffff8108074b>] vfs_write+0xad/0x156 [<ffffffff81080ced>] sys_write+0x45/0x6e [<ffffffff81009c9c>] tracesys+0xdc/0xe1 Code: 49 83 7c 24 08 00 75 0e 48 c7 44 24 08 00 00 00 00 e9 6a 02 RIP [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 RSP <ffff810071563e48> CR2: 000000000000186a >From your patch, if we dont the lock, the race condition maybe occur at next_node(). On 2007-07-17 21:35, Hugh Dickins wrote: > On Tue, 17 Jul 2007, Andrew Morton wrote: > > > > Given that we've now gone and added deliberate-but-we-hope-benign > > races into this code, an elaborate comment which explains and justifies > > it all is pretty much obligatory, IMO. > > [PATCH] Remove nid_lock from alloc_fresh_huge_page > > The fix to that race in alloc_fresh_huge_page() which could give an illegal > node ID did not need nid_lock at all: the fix was to replace static int nid > by static int prev_nid and do the work on local int nid. nid_lock did make > sure that racers strictly roundrobin the nodes, but that's not something we > need to enforce strictly. Kill nid_lock. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > --- > mm/hugetlb.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100 > +++ linux/mm/hugetlb.c 2007-07-17 21:29:58.000000000 +0100 > @@ -107,15 +107,19 @@ static int alloc_fresh_huge_page(void) > { > static int prev_nid; > struct page *page; > - static DEFINE_SPINLOCK(nid_lock); > int nid; > > - spin_lock(&nid_lock); > + /* > + * Copy static prev_nid to local nid, work on that, then copy it > + * back to prev_nid afterwards: otherwise there's a window in which > + * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node. > + * But we don't need to use a spin_lock here: it really doesn't > + * matter if occasionally a racer chooses the same nid as we do. > + */ > nid = next_node(prev_nid, node_online_map); > if (nid == MAX_NUMNODES) > nid = first_node(node_online_map); > prev_nid = nid; > - spin_unlock(&nid_lock); > > page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > HUGETLB_PAGE_ORDER); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 1:40 ` Joe Jin @ 2007-07-18 4:49 ` Hugh Dickins 2007-07-18 5:45 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Hugh Dickins @ 2007-07-18 4:49 UTC (permalink / raw) To: Joe Jin; +Cc: Andrew Morton, Oleg Nesterov, bill.irwin, linux-kernel On Wed, 18 Jul 2007, Joe Jin wrote: > > With your patch, I have reproduced the panic: That is... surprising to me. (I hadn't been able to reproduce it with or without the patches: maybe I just need to try harder.) Please post your gcc --version, and the disassembly (objdump -d) output for alloc_fresh_huge_page. Or can someone else make sense of this - Oleg? To me it still seems that nid_lock can only be irrelevant (doesn't even provide a compiler barrier between prev_nid and nid transactions). You remark lower down > From your patch, if we dont the lock, the race condition maybe occur > at next_node(). but I don't see how (if next_node were a macro which evaluates its args more than once, perhaps, but that doesn't seem to be the case). Thanks a lot, Hugh > > Unable to handle kernel paging request at 000000000000186a RIP: > [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 > PGD 72595067 PUD 72594067 PMD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand dm_mirror dm_multipath dm_mod video sbs button battery backlight ac snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sg snd_timer shpchp snd soundcore tg3 i2c_i801 piix i2c_core snd_page_alloc ide_cd cdrom serio_raw ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > Pid: 3996, comm: sh Not tainted 2.6.22 #9 > RIP: 0010:[<ffffffff8105d4e7>] [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 > RSP: 0018:ffff810071563e48 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 000000e8d4a51000 RCX: 0000000000000000 > RDX: ffff810071563fd8 RSI: 0000000000000009 RDI: 00000000000242d2 > RBP: 00000000000242d2 R08: 0000000000000000 R09: 0000000000043a01 > R10: ffff810000e88000 R11: ffffffff81072a02 R12: 0000000000001862 > R13: ffff810070c74ea0 R14: 0000000000000009 R15: 00002adc51042000 > FS: 00002adc4db3ddb0(0000) GS:ffffffff81312000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000000186a CR3: 000000007d963000 CR4: 00000000000006e0 > Process sh (pid: 3996, threadinfo ffff810071562000, task ffff810070c74ea0) > Stack: 00000010000242d2 ffff810000e88000 ffff810070e76710 ffffffff812e31e0 > ffffffffffffffff 000000e8d4a51000 ffffffffffffffff ffff810077d11b00 > 000000000000000e ffff810071563f50 00002adc51042000 ffffffff81071a67 > Call Trace: > [<ffffffff81071a67>] alloc_fresh_huge_page+0x98/0xe7 > [<ffffffff8107263b>] hugetlb_sysctl_handler+0x14/0xf1 > [<ffffffff810bdad9>] proc_sys_write+0x7c/0xa6 > [<ffffffff8108074b>] vfs_write+0xad/0x156 > [<ffffffff81080ced>] sys_write+0x45/0x6e > [<ffffffff81009c9c>] tracesys+0xdc/0xe1 > > > Code: 49 83 7c 24 08 00 75 0e 48 c7 44 24 08 00 00 00 00 e9 6a 02 > RIP [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3 > RSP <ffff810071563e48> > CR2: 000000000000186a > > >From your patch, if we dont the lock, the race condition maybe occur at > next_node(). > > On 2007-07-17 21:35, Hugh Dickins wrote: > > On Tue, 17 Jul 2007, Andrew Morton wrote: > > > > > > Given that we've now gone and added deliberate-but-we-hope-benign > > > races into this code, an elaborate comment which explains and justifies > > > it all is pretty much obligatory, IMO. > > > > [PATCH] Remove nid_lock from alloc_fresh_huge_page > > > > The fix to that race in alloc_fresh_huge_page() which could give an illegal > > node ID did not need nid_lock at all: the fix was to replace static int nid > > by static int prev_nid and do the work on local int nid. nid_lock did make > > sure that racers strictly roundrobin the nodes, but that's not something we > > need to enforce strictly. Kill nid_lock. > > > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > > --- > > mm/hugetlb.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100 > > +++ linux/mm/hugetlb.c 2007-07-17 21:29:58.000000000 +0100 > > @@ -107,15 +107,19 @@ static int alloc_fresh_huge_page(void) > > { > > static int prev_nid; > > struct page *page; > > - static DEFINE_SPINLOCK(nid_lock); > > int nid; > > > > - spin_lock(&nid_lock); > > + /* > > + * Copy static prev_nid to local nid, work on that, then copy it > > + * back to prev_nid afterwards: otherwise there's a window in which > > + * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node. > > + * But we don't need to use a spin_lock here: it really doesn't > > + * matter if occasionally a racer chooses the same nid as we do. > > + */ > > nid = next_node(prev_nid, node_online_map); > > if (nid == MAX_NUMNODES) > > nid = first_node(node_online_map); > > prev_nid = nid; > > - spin_unlock(&nid_lock); > > > > page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN, > > HUGETLB_PAGE_ORDER); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 4:49 ` Hugh Dickins @ 2007-07-18 5:45 ` Andrew Morton 2007-07-18 7:34 ` Joe Jin 2007-07-18 6:32 ` Joe Jin 2007-07-18 8:09 ` Joe Jin 2 siblings, 1 reply; 37+ messages in thread From: Andrew Morton @ 2007-07-18 5:45 UTC (permalink / raw) To: Hugh Dickins; +Cc: Joe Jin, Oleg Nesterov, bill.irwin, linux-kernel On Wed, 18 Jul 2007 05:49:50 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > > With your patch, I have reproduced the panic: > > That is... surprising to me. To me also. I'd like to double-check the code which Joe actually tested, please - have a sneaking suspicion that the 2.6.22 version of that function was tested ;) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 5:45 ` Andrew Morton @ 2007-07-18 7:34 ` Joe Jin 0 siblings, 0 replies; 37+ messages in thread From: Joe Jin @ 2007-07-18 7:34 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Joe Jin, Oleg Nesterov, bill.irwin, linux-kernel > > > > That is... surprising to me. > > To me also. I'd like to double-check the code which Joe actually > tested, please - have a sneaking suspicion that the 2.6.22 version > of that function was tested ;) Maybe I make a fault? I'll testing it again then give you a report. Thank, Joe ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 4:49 ` Hugh Dickins 2007-07-18 5:45 ` Andrew Morton @ 2007-07-18 6:32 ` Joe Jin 2007-07-18 8:09 ` Joe Jin 2 siblings, 0 replies; 37+ messages in thread From: Joe Jin @ 2007-07-18 6:32 UTC (permalink / raw) To: Hugh Dickins Cc: Joe Jin, Andrew Morton, Oleg Nesterov, bill.irwin, linux-kernel On 2007-07-18 05:49, Hugh Dickins wrote: > On Wed, 18 Jul 2007, Joe Jin wrote: > > > > With your patch, I have reproduced the panic: > > That is... surprising to me. (I hadn't been able to reproduce it with > or without the patches: maybe I just need to try harder.) Please post > your gcc --version, and the disassembly (objdump -d) output for > alloc_fresh_huge_page. > It will easy to reproduce, just run following scripts on *2* difference terminal at the same time, I think you will hit it in a short time. #!/bin/bash while : ; do echo 1000000000000 > /proc/sys/vm/nr_hugepages echo 1 > /proc/sys/vm/nr_hugepages echo 10000000000000000000 > /proc/sys/vm/nr_hugepages echo 0 > /proc/sys/vm/nr_hugepages done # gcc --version gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ------------------------------------------------------------------------------------------------------------------------ Disassembly alloc_fresh_huge_page with gdb: 0xffffffff810719cf <alloc_fresh_huge_page+0>: mov 3651606(%rip),%r8d # 0xffffffff813ed1ec <prev_nid.16422> 0xffffffff810719d6 <alloc_fresh_huge_page+7>: mov 2841027(%rip),%rdi # 0xffffffff813273a0 <node_online_map> 0xffffffff810719dd <alloc_fresh_huge_page+14>: mov $0x3f,%eax 0xffffffff810719e2 <alloc_fresh_huge_page+19>: push %rbx 0xffffffff810719e3 <alloc_fresh_huge_page+20>: lea 0x1(%r8),%esi 0xffffffff810719e7 <alloc_fresh_huge_page+24>: sub %r8d,%eax 0xffffffff810719ea <alloc_fresh_huge_page+27>: mov %rdi,%rdx 0xffffffff810719ed <alloc_fresh_huge_page+30>: cltq 0xffffffff810719ef <alloc_fresh_huge_page+32>: mov %sil,%cl 0xffffffff810719f2 <alloc_fresh_huge_page+35>: shr %cl,%rdx 0xffffffff810719f5 <alloc_fresh_huge_page+38>: bsf %rdx,%rcx 0xffffffff810719f9 <alloc_fresh_huge_page+42>: cmove %rax,%rcx 0xffffffff810719fd <alloc_fresh_huge_page+46>: lea (%rsi,%rcx,1),%ecx 0xffffffff81071a00 <alloc_fresh_huge_page+49>: mov $0x40,%eax 0xffffffff81071a05 <alloc_fresh_huge_page+54>: cmp $0x40,%ecx 0xffffffff81071a08 <alloc_fresh_huge_page+57>: cmovg %eax,%ecx 0xffffffff81071a0b <alloc_fresh_huge_page+60>: cmp $0x40,%ecx 0xffffffff81071a0e <alloc_fresh_huge_page+63>: mov %ecx,3651540(%rip) # 0xffffffff813ed1e8 <nid.16423> 0xffffffff81071a14 <alloc_fresh_huge_page+69>: jne 0xffffffff81071a2f <alloc_fresh_huge_page+96> 0xffffffff81071a16 <alloc_fresh_huge_page+71>: mov $0x40,%eax 0xffffffff81071a1b <alloc_fresh_huge_page+76>: bsf %rdi,%rdx 0xffffffff81071a1f <alloc_fresh_huge_page+80>: cmove %rax,%rdx 0xffffffff81071a23 <alloc_fresh_huge_page+84>: cmp $0x40,%edx 0xffffffff81071a26 <alloc_fresh_huge_page+87>: cmovle %edx,%ecx 0xffffffff81071a29 <alloc_fresh_huge_page+90>: mov %ecx,3651513(%rip) # 0xffffffff813ed1e8 <nid.16423> 0xffffffff81071a2f <alloc_fresh_huge_page+96>: mov 3651507(%rip),%eax # 0xffffffff813ed1e8 <nid.16423> 0xffffffff81071a35 <alloc_fresh_huge_page+102>: test %eax,%eax 0xffffffff81071a37 <alloc_fresh_huge_page+104>: mov %eax,3651503(%rip) # 0xffffffff813ed1ec <prev_nid.16422> 0xffffffff81071a3d <alloc_fresh_huge_page+110>: jns 0xffffffff81071a47 <alloc_fresh_huge_page+120> 0xffffffff81071a3f <alloc_fresh_huge_page+112>: mov %gs:0x30,%eax 0xffffffff81071a47 <alloc_fresh_huge_page+120>: cltq 0xffffffff81071a49 <alloc_fresh_huge_page+122>: mov $0x9,%esi 0xffffffff81071a4e <alloc_fresh_huge_page+127>: mov $0x242d2,%edi 0xffffffff81071a53 <alloc_fresh_huge_page+132>: mov 0xffffffff813260e0(,%rax,8),%rdx 0xffffffff81071a5b <alloc_fresh_huge_page+140>: add $0x1860,%rdx 0xffffffff81071a62 <alloc_fresh_huge_page+147>: callq 0xffffffff8105d4b8 <__alloc_pages> 0xffffffff81071a67 <alloc_fresh_huge_page+152>: mov %rax,%rbx 0xffffffff81071a6a <alloc_fresh_huge_page+155>: xor %eax,%eax 0xffffffff81071a6c <alloc_fresh_huge_page+157>: test %rbx,%rbx 0xffffffff81071a6f <alloc_fresh_huge_page+160>: je 0xffffffff81071ab4 <alloc_fresh_huge_page+229> 0xffffffff81071a71 <alloc_fresh_huge_page+162>: movq $0xffffffff81072a02,0x60(%rbx) 0xffffffff81071a79 <alloc_fresh_huge_page+170>: mov $0xffffffff812e6c68,%rdi 0xffffffff81071a80 <alloc_fresh_huge_page+177>: callq 0xffffffff81217952 <_spin_lock> 0xffffffff81071a85 <alloc_fresh_huge_page+182>: incq 3651428(%rip) # 0xffffffff813ed1f0 <nr_huge_pages> 0xffffffff81071a8c <alloc_fresh_huge_page+189>: mov (%rbx),%rax 0xffffffff81071a8f <alloc_fresh_huge_page+192>: shr $0x2d,%rax 0xffffffff81071a93 <alloc_fresh_huge_page+196>: and $0x3f,%eax 0xffffffff81071a96 <alloc_fresh_huge_page+199>: incl 0xffffffff813ed620(,%rax,4) 0xffffffff81071a9d <alloc_fresh_huge_page+206>: movl $0x1,2576833(%rip) # 0xffffffff812e6c68 <hugetlb_lock> 0xffffffff81071aa7 <alloc_fresh_huge_page+216>: mov %rbx,%rdi 0xffffffff81071aaa <alloc_fresh_huge_page+219>: callq 0xffffffff8105f9cc <put_page> 0xffffffff81071aaf <alloc_fresh_huge_page+224>: mov $0x1,%eax 0xffffffff81071ab4 <alloc_fresh_huge_page+229>: pop %rbx 0xffffffff81071ab5 <alloc_fresh_huge_page+230>: retq Thanks, Joe ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 4:49 ` Hugh Dickins 2007-07-18 5:45 ` Andrew Morton 2007-07-18 6:32 ` Joe Jin @ 2007-07-18 8:09 ` Joe Jin 2007-07-18 8:35 ` Hugh Dickins 2 siblings, 1 reply; 37+ messages in thread From: Joe Jin @ 2007-07-18 8:09 UTC (permalink / raw) To: Hugh Dickins Cc: Joe Jin, Andrew Morton, Oleg Nesterov, bill.irwin, linux-kernel Sorry for I use a unclean kernel tree, the panic have gone ;) also, I will testing it again with other server, if I hit it I'll let you know. Thanks, Joe ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Add nid sanity on alloc_pages_node 2007-07-18 8:09 ` Joe Jin @ 2007-07-18 8:35 ` Hugh Dickins 0 siblings, 0 replies; 37+ messages in thread From: Hugh Dickins @ 2007-07-18 8:35 UTC (permalink / raw) To: Joe Jin; +Cc: Andrew Morton, Oleg Nesterov, bill.irwin, linux-kernel On Wed, 18 Jul 2007, Joe Jin wrote: > Sorry for I use a unclean kernel tree, the panic have gone ;) > also, I will testing it again with other server, if I hit it I'll let you know. That's a relief! Thanks, Hugh ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2007-07-18 9:01 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-13 2:45 [PATCH] Add nid sanity on alloc_pages_node Joe Jin 2007-07-13 5:18 ` Andrew Morton 2007-07-13 6:40 ` Joe Jin 2007-07-13 6:49 ` Andrew Morton 2007-07-13 6:57 ` Andrew Morton 2007-07-13 8:29 ` Paul Jackson 2007-07-13 8:38 ` Andrew Morton 2007-07-13 8:43 ` Paul Jackson 2007-07-13 8:49 ` Andrew Morton 2007-07-13 8:54 ` Paul Jackson 2007-07-13 12:48 ` Benjamin Herrenschmidt 2007-07-13 8:03 ` Joe Jin 2007-07-13 8:15 ` Andrew Morton 2007-07-13 12:18 ` Joe Jin 2007-07-13 12:42 ` Paul Jackson 2007-07-14 17:40 ` Nish Aravamudan 2007-07-14 18:04 ` Andrew Morton 2007-07-14 20:47 ` Nish Aravamudan 2007-07-13 8:04 ` gurudas pai 2007-07-13 8:19 ` Andrew Morton 2007-07-13 12:37 ` gurudas pai 2007-07-13 8:37 ` Joe Jin 2007-07-13 8:44 ` Andrew Morton 2007-07-17 15:04 ` Hugh Dickins 2007-07-17 16:32 ` Andrew Morton 2007-07-17 17:26 ` Hugh Dickins 2007-07-17 18:58 ` Andrew Morton 2007-07-17 19:49 ` Hugh Dickins 2007-07-17 20:01 ` Andrew Morton 2007-07-17 20:35 ` Hugh Dickins 2007-07-18 1:40 ` Joe Jin 2007-07-18 4:49 ` Hugh Dickins 2007-07-18 5:45 ` Andrew Morton 2007-07-18 7:34 ` Joe Jin 2007-07-18 6:32 ` Joe Jin 2007-07-18 8:09 ` Joe Jin 2007-07-18 8:35 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox