* [RFC 0/2] harden alloc_pages against bogus nid @ 2018-08-01 20:04 Jeremy Linton 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw) To: linux-arm-kernel The thread "avoid alloc memory on offline node" https://lkml.org/lkml/2018/6/7/251 Asked at one point why the kzalloc_node was crashing rather than returning memory from a valid node. The thread ended up fixing the immediate causes of the crash but left open the case of bad proximity values being in DSDT tables without corrisponding SRAT/SLIT entries as is happening on another machine. Its also easy to fix that, but we should also harden the allocator sufficiently that it doesn't crash when passed an invalid node id. There are a couple possible ways to do this, and i've attached two separate patches which individually fix that problem. The first detects the offline node before calling the new_slab code path when it becomes apparent that the allocation isn't going to succeed. The second actually hardens node_zonelist() and prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL zonelist. This latter case happens if the node has never been initialized or is possibly out of range. There are other places (NODE_DATA & online_node) which should be checking if the node id's are > MAX_NUMNODES. Jeremy Linton (2): slub: Avoid trying to allocate memory on offline nodes mm: harden alloc_pages code paths against bogus nodes include/linux/gfp.h | 2 ++ mm/page_alloc.c | 2 ++ mm/slub.c | 2 ++ 3 files changed, 6 insertions(+) -- 2.14.3 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton @ 2018-08-01 20:04 ` Jeremy Linton 2018-08-02 9:15 ` Michal Hocko 2018-08-02 14:23 ` Christopher Lameter 2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton 2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton 2 siblings, 2 replies; 15+ messages in thread From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw) To: linux-arm-kernel If a user calls the *alloc_node() functions with an invalid node its possible to crash in alloc_pages_nodemask because NODE_DATA() returns a bad node, which propogates into the node zonelist in prepare_alloc_pages. This avoids that by not trying to allocate new slabs against offline nodes. (example backtrace) __alloc_pages_nodemask+0x128/0xf48 allocate_slab+0x94/0x528 new_slab+0x68/0xc8 ___slab_alloc+0x44c/0x520 __slab_alloc+0x50/0x68 kmem_cache_alloc_node_trace+0xe0/0x230 Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- mm/slub.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 51258eff4178..e03719bac1e2 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, if (unlikely(!node_match(page, searchnode))) { stat(s, ALLOC_NODE_MISMATCH); deactivate_slab(s, page, c->freelist, c); + if (!node_online(searchnode)) + node = NUMA_NO_NODE; goto new_slab; } } -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton @ 2018-08-02 9:15 ` Michal Hocko 2018-08-03 3:21 ` Jeremy Linton 2018-08-02 14:23 ` Christopher Lameter 1 sibling, 1 reply; 15+ messages in thread From: Michal Hocko @ 2018-08-02 9:15 UTC (permalink / raw) To: linux-arm-kernel On Wed 01-08-18 15:04:17, Jeremy Linton wrote: [...] > @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > if (unlikely(!node_match(page, searchnode))) { > stat(s, ALLOC_NODE_MISMATCH); > deactivate_slab(s, page, c->freelist, c); > + if (!node_online(searchnode)) > + node = NUMA_NO_NODE; > goto new_slab; This is inherently racy. Numa node can get offline at any point after you check it here. Making it race free would involve some sort of locking and I am not really convinced this is a good idea. > } > } > -- > 2.14.3 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-02 9:15 ` Michal Hocko @ 2018-08-03 3:21 ` Jeremy Linton 2018-08-03 6:20 ` Michal Hocko 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Linton @ 2018-08-03 3:21 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/02/2018 04:15 AM, Michal Hocko wrote: > On Wed 01-08-18 15:04:17, Jeremy Linton wrote: > [...] >> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> if (unlikely(!node_match(page, searchnode))) { >> stat(s, ALLOC_NODE_MISMATCH); >> deactivate_slab(s, page, c->freelist, c); >> + if (!node_online(searchnode)) >> + node = NUMA_NO_NODE; >> goto new_slab; > > This is inherently racy. Numa node can get offline at any point after > you check it here. Making it race free would involve some sort of > locking and I am not really convinced this is a good idea. I spent some time looking/thinking about this, and i'm pretty sure its not creating any new problems. But OTOH, I think the node_online() check is probably a bit misleading as what we really want to assure is that node<MAX_NUMNODES and that there is going to be a valid entry in NODE_DATA() so we don't deference null. > >> } >> } >> -- >> 2.14.3 >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-03 3:21 ` Jeremy Linton @ 2018-08-03 6:20 ` Michal Hocko 0 siblings, 0 replies; 15+ messages in thread From: Michal Hocko @ 2018-08-03 6:20 UTC (permalink / raw) To: linux-arm-kernel On Thu 02-08-18 22:21:53, Jeremy Linton wrote: > Hi, > > On 08/02/2018 04:15 AM, Michal Hocko wrote: > > On Wed 01-08-18 15:04:17, Jeremy Linton wrote: > > [...] > > > @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > > if (unlikely(!node_match(page, searchnode))) { > > > stat(s, ALLOC_NODE_MISMATCH); > > > deactivate_slab(s, page, c->freelist, c); > > > + if (!node_online(searchnode)) > > > + node = NUMA_NO_NODE; > > > goto new_slab; > > > > This is inherently racy. Numa node can get offline at any point after > > you check it here. Making it race free would involve some sort of > > locking and I am not really convinced this is a good idea. > > I spent some time looking/thinking about this, and i'm pretty sure its not > creating any new problems. But OTOH, I think the node_online() check is > probably a bit misleading as what we really want to assure is that > node<MAX_NUMNODES and that there is going to be a valid entry in NODE_DATA() > so we don't deference null. Exactly. And we do rely that the user of the allocator doesn't really use bogus parameters. This is not a function to be used for untrusted or unsanitized inputs. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton 2018-08-02 9:15 ` Michal Hocko @ 2018-08-02 14:23 ` Christopher Lameter 2018-08-03 3:12 ` Jeremy Linton 1 sibling, 1 reply; 15+ messages in thread From: Christopher Lameter @ 2018-08-02 14:23 UTC (permalink / raw) To: linux-arm-kernel On Wed, 1 Aug 2018, Jeremy Linton wrote: > diff --git a/mm/slub.c b/mm/slub.c > index 51258eff4178..e03719bac1e2 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > if (unlikely(!node_match(page, searchnode))) { > stat(s, ALLOC_NODE_MISMATCH); > deactivate_slab(s, page, c->freelist, c); > + if (!node_online(searchnode)) > + node = NUMA_NO_NODE; > goto new_slab; > } > } > Would it not be better to implement this check in the page allocator? There is also the issue of how to fallback to the nearest node. NUMA_NO_NODE should fallback to the current memory allocation policy but it seems by inserting it here you would end up just with the default node for the processor. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes 2018-08-02 14:23 ` Christopher Lameter @ 2018-08-03 3:12 ` Jeremy Linton 0 siblings, 0 replies; 15+ messages in thread From: Jeremy Linton @ 2018-08-03 3:12 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/02/2018 09:23 AM, Christopher Lameter wrote: > On Wed, 1 Aug 2018, Jeremy Linton wrote: > >> diff --git a/mm/slub.c b/mm/slub.c >> index 51258eff4178..e03719bac1e2 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> if (unlikely(!node_match(page, searchnode))) { >> stat(s, ALLOC_NODE_MISMATCH); >> deactivate_slab(s, page, c->freelist, c); >> + if (!node_online(searchnode)) >> + node = NUMA_NO_NODE; >> goto new_slab; >> } >> } >> > > Would it not be better to implement this check in the page allocator? > There is also the issue of how to fallback to the nearest node. Possibly? Falling back to the nearest node though, should be handled if memory-less nodes is enabled, which in the problematic case isn't. > > NUMA_NO_NODE should fallback to the current memory allocation policy but > it seems by inserting it here you would end up just with the default node > for the processor. I picked this spot (compared to 2/2) because a number of paths are funneling through here, and in this case it shouldn't be a very hot path. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes 2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton @ 2018-08-01 20:04 ` Jeremy Linton 2018-08-02 7:31 ` Michal Hocko 2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton 2 siblings, 1 reply; 15+ messages in thread From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw) To: linux-arm-kernel Its possible to crash __alloc_pages_nodemask by passing it bogus node ids. This is caused by NODE_DATA() returning null (hopefully) when the requested node is offline. We can harded against the basic case of a mostly valid node, that isn't online by checking for null and failing prepare_alloc_pages. But this then suggests we should also harden NODE_DATA() like this #define NODE_DATA(nid) ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL) eventually this starts to add a bunch of generally uneeded checks in some code paths that are called quite frequently. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- include/linux/gfp.h | 2 ++ mm/page_alloc.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index a6afcec53795..17d70271c42e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags) */ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) { + if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON? + return NULL; return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a790ef4be74e..3a3d9ac2662a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, { ac->high_zoneidx = gfp_zone(gfp_mask); ac->zonelist = node_zonelist(preferred_nid, gfp_mask); + if (!ac->zonelist) + return false; ac->nodemask = nodemask; ac->migratetype = gfpflags_to_migratetype(gfp_mask); -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes 2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton @ 2018-08-02 7:31 ` Michal Hocko 2018-08-03 3:17 ` Jeremy Linton 0 siblings, 1 reply; 15+ messages in thread From: Michal Hocko @ 2018-08-02 7:31 UTC (permalink / raw) To: linux-arm-kernel On Wed 01-08-18 15:04:18, Jeremy Linton wrote: > Its possible to crash __alloc_pages_nodemask by passing it > bogus node ids. This is caused by NODE_DATA() returning null > (hopefully) when the requested node is offline. We can > harded against the basic case of a mostly valid node, that > isn't online by checking for null and failing prepare_alloc_pages. > > But this then suggests we should also harden NODE_DATA() like this > > #define NODE_DATA(nid) ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL) > > eventually this starts to add a bunch of generally uneeded checks > in some code paths that are called quite frequently. But the page allocator is really a hot path and people will not be happy to have yet another branch there. No code should really use invalid numa node ids in the first place. If I remember those bugs correctly then it was the arch code which was doing something wrong. I would prefer that code to be fixed instead. > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > --- > include/linux/gfp.h | 2 ++ > mm/page_alloc.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index a6afcec53795..17d70271c42e 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags) > */ > static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > { > + if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON? > + return NULL; > return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); > } > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a790ef4be74e..3a3d9ac2662a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, > { > ac->high_zoneidx = gfp_zone(gfp_mask); > ac->zonelist = node_zonelist(preferred_nid, gfp_mask); > + if (!ac->zonelist) > + return false; > ac->nodemask = nodemask; > ac->migratetype = gfpflags_to_migratetype(gfp_mask); > > -- > 2.14.3 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes 2018-08-02 7:31 ` Michal Hocko @ 2018-08-03 3:17 ` Jeremy Linton 2018-08-03 6:24 ` Michal Hocko 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Linton @ 2018-08-03 3:17 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/02/2018 02:31 AM, Michal Hocko wrote: > On Wed 01-08-18 15:04:18, Jeremy Linton wrote: >> Its possible to crash __alloc_pages_nodemask by passing it >> bogus node ids. This is caused by NODE_DATA() returning null >> (hopefully) when the requested node is offline. We can >> harded against the basic case of a mostly valid node, that >> isn't online by checking for null and failing prepare_alloc_pages. >> >> But this then suggests we should also harden NODE_DATA() like this >> >> #define NODE_DATA(nid) ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL) >> >> eventually this starts to add a bunch of generally uneeded checks >> in some code paths that are called quite frequently. > > But the page allocator is really a hot path and people will not be happy > to have yet another branch there. No code should really use invalid numa > node ids in the first place. > > If I remember those bugs correctly then it was the arch code which was > doing something wrong. I would prefer that code to be fixed instead. Yes, I think the consensus is that 2/2 should be dropped. The arch code is being fixed (both cases) this patch set is just an attempt to harden this code path against future failures like that so that we get some warnings/ugly messages rather than early boot failures. Thanks, >> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >> --- >> include/linux/gfp.h | 2 ++ >> mm/page_alloc.c | 2 ++ >> 2 files changed, 4 insertions(+) >> >> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >> index a6afcec53795..17d70271c42e 100644 >> --- a/include/linux/gfp.h >> +++ b/include/linux/gfp.h >> @@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags) >> */ >> static inline struct zonelist *node_zonelist(int nid, gfp_t flags) >> { >> + if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON? >> + return NULL; >> return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); >> } >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a790ef4be74e..3a3d9ac2662a 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, >> { >> ac->high_zoneidx = gfp_zone(gfp_mask); >> ac->zonelist = node_zonelist(preferred_nid, gfp_mask); >> + if (!ac->zonelist) >> + return false; >> ac->nodemask = nodemask; >> ac->migratetype = gfpflags_to_migratetype(gfp_mask); >> >> -- >> 2.14.3 >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes 2018-08-03 3:17 ` Jeremy Linton @ 2018-08-03 6:24 ` Michal Hocko 0 siblings, 0 replies; 15+ messages in thread From: Michal Hocko @ 2018-08-03 6:24 UTC (permalink / raw) To: linux-arm-kernel On Thu 02-08-18 22:17:49, Jeremy Linton wrote: > Hi, > > On 08/02/2018 02:31 AM, Michal Hocko wrote: > > On Wed 01-08-18 15:04:18, Jeremy Linton wrote: > > > Its possible to crash __alloc_pages_nodemask by passing it > > > bogus node ids. This is caused by NODE_DATA() returning null > > > (hopefully) when the requested node is offline. We can > > > harded against the basic case of a mostly valid node, that > > > isn't online by checking for null and failing prepare_alloc_pages. > > > > > > But this then suggests we should also harden NODE_DATA() like this > > > > > > #define NODE_DATA(nid) ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL) > > > > > > eventually this starts to add a bunch of generally uneeded checks > > > in some code paths that are called quite frequently. > > > > But the page allocator is really a hot path and people will not be happy > > to have yet another branch there. No code should really use invalid numa > > node ids in the first place. > > > > If I remember those bugs correctly then it was the arch code which was > > doing something wrong. I would prefer that code to be fixed instead. > > Yes, I think the consensus is that 2/2 should be dropped. > > The arch code is being fixed (both cases) this patch set is just an attempt > to harden this code path against future failures like that so that we get > some warnings/ugly messages rather than early boot failures. Hmm, this is a completely different story. We do have VM_{BUG,WARN}_ON which are noops for most configurations. It is primarily meant to be enabled for developers or special debug kernels. If you have an example when such an early splat in the log would safe a lot of head scratching then this would sound like a reasonable justification to add VM_WARN_ON(!NODE_DATA(nid)) into the page allocator, me thinks. But considering that would should get NULL ptr splat anyway then I am not really so sure. But maybe we are in a context where warning would get into the log while a blow up would just make the whole machine silent... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 0/2] harden alloc_pages against bogus nid 2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton 2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton @ 2018-08-01 21:50 ` Andrew Morton 2018-08-01 22:56 ` Jeremy Linton 2 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2018-08-01 21:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, 1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: > The thread "avoid alloc memory on offline node" > > https://lkml.org/lkml/2018/6/7/251 > > Asked at one point why the kzalloc_node was crashing rather than > returning memory from a valid node. The thread ended up fixing > the immediate causes of the crash but left open the case of bad > proximity values being in DSDT tables without corrisponding > SRAT/SLIT entries as is happening on another machine. > > Its also easy to fix that, but we should also harden the allocator > sufficiently that it doesn't crash when passed an invalid node id. > There are a couple possible ways to do this, and i've attached two > separate patches which individually fix that problem. > > The first detects the offline node before calling > the new_slab code path when it becomes apparent that the allocation isn't > going to succeed. The second actually hardens node_zonelist() and > prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL > zonelist. This latter case happens if the node has never been initialized > or is possibly out of range. There are other places (NODE_DATA & > online_node) which should be checking if the node id's are > MAX_NUMNODES. > What is it that leads to a caller requesting memory from an invalid node? A race against offlining? If so then that's a lack of appropriate locking, isn't it? I don't see a problem with emitting a warning and then selecting a different node so we can keep running. But we do want that warning, so we can understand the root cause and fix it? ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 0/2] harden alloc_pages against bogus nid 2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton @ 2018-08-01 22:56 ` Jeremy Linton 2018-08-02 0:14 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Linton @ 2018-08-01 22:56 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/01/2018 04:50 PM, Andrew Morton wrote: > On Wed, 1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: > >> The thread "avoid alloc memory on offline node" >> >> https://lkml.org/lkml/2018/6/7/251 >> >> Asked at one point why the kzalloc_node was crashing rather than >> returning memory from a valid node. The thread ended up fixing >> the immediate causes of the crash but left open the case of bad >> proximity values being in DSDT tables without corrisponding >> SRAT/SLIT entries as is happening on another machine. >> >> Its also easy to fix that, but we should also harden the allocator >> sufficiently that it doesn't crash when passed an invalid node id. >> There are a couple possible ways to do this, and i've attached two >> separate patches which individually fix that problem. >> >> The first detects the offline node before calling >> the new_slab code path when it becomes apparent that the allocation isn't >> going to succeed. The second actually hardens node_zonelist() and >> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL >> zonelist. This latter case happens if the node has never been initialized >> or is possibly out of range. There are other places (NODE_DATA & >> online_node) which should be checking if the node id's are > MAX_NUMNODES. >> > > What is it that leads to a caller requesting memory from an invalid > node? A race against offlining? If so then that's a lack of > appropriate locking, isn't it? There were a couple unrelated cases, both having to do with the PXN associated with a PCI port. The first case AFAIK, the domain wasn't really invalid if the entire SRAT was parsed and nodes created even when there weren't associated CPUs. The second case (a different machine) is simply a PXN value that is completely invalid (no associated SLIT/SRAT/etc entries) due to firmware making a mistake when a socket isn't populated. There have been a few other suggested or merged patches for the individual problems above, this set is just an attempt at avoiding a full crash if/when another similar problem happens. > > I don't see a problem with emitting a warning and then selecting a > different node so we can keep running. But we do want that warning, so > we can understand the root cause and fix it? Yes, we do want to know when an invalid id is passed, i will add the VM_WARN in the first one. The second one I wasn't sure about as failing prepare_alloc_pages() generates a couple of error messages, but the system then continues operation. I guess my question though is which method (or both/something else?) is the preferred way to harden this up? Thanks for looking at this. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 0/2] harden alloc_pages against bogus nid 2018-08-01 22:56 ` Jeremy Linton @ 2018-08-02 0:14 ` Andrew Morton 2018-08-03 3:15 ` Jeremy Linton 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2018-08-02 0:14 UTC (permalink / raw) To: linux-arm-kernel On Wed, 1 Aug 2018 17:56:46 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: > Hi, > > On 08/01/2018 04:50 PM, Andrew Morton wrote: > > On Wed, 1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: > > > >> The thread "avoid alloc memory on offline node" > >> > >> https://lkml.org/lkml/2018/6/7/251 > >> > >> Asked at one point why the kzalloc_node was crashing rather than > >> returning memory from a valid node. The thread ended up fixing > >> the immediate causes of the crash but left open the case of bad > >> proximity values being in DSDT tables without corrisponding > >> SRAT/SLIT entries as is happening on another machine. > >> > >> Its also easy to fix that, but we should also harden the allocator > >> sufficiently that it doesn't crash when passed an invalid node id. > >> There are a couple possible ways to do this, and i've attached two > >> separate patches which individually fix that problem. > >> > >> The first detects the offline node before calling > >> the new_slab code path when it becomes apparent that the allocation isn't > >> going to succeed. The second actually hardens node_zonelist() and > >> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL > >> zonelist. This latter case happens if the node has never been initialized > >> or is possibly out of range. There are other places (NODE_DATA & > >> online_node) which should be checking if the node id's are > MAX_NUMNODES. > >> > > > > What is it that leads to a caller requesting memory from an invalid > > node? A race against offlining? If so then that's a lack of > > appropriate locking, isn't it? > > There were a couple unrelated cases, both having to do with the PXN > associated with a PCI port. The first case AFAIK, the domain wasn't > really invalid if the entire SRAT was parsed and nodes created even when > there weren't associated CPUs. The second case (a different machine) is > simply a PXN value that is completely invalid (no associated > SLIT/SRAT/etc entries) due to firmware making a mistake when a socket > isn't populated. > > There have been a few other suggested or merged patches for the > individual problems above, this set is just an attempt at avoiding a > full crash if/when another similar problem happens. Please add the above info to the changelog. > > > > > I don't see a problem with emitting a warning and then selecting a > > different node so we can keep running. But we do want that warning, so > > we can understand the root cause and fix it? > > Yes, we do want to know when an invalid id is passed, i will add the > VM_WARN in the first one. > > The second one I wasn't sure about as failing prepare_alloc_pages() > generates a couple of error messages, but the system then continues > operation. > > I guess my question though is which method (or both/something else?) is > the preferred way to harden this up? The first patch looked neater. Can we get a WARN_ON in there as well? ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 0/2] harden alloc_pages against bogus nid 2018-08-02 0:14 ` Andrew Morton @ 2018-08-03 3:15 ` Jeremy Linton 0 siblings, 0 replies; 15+ messages in thread From: Jeremy Linton @ 2018-08-03 3:15 UTC (permalink / raw) To: linux-arm-kernel Hi, On 08/01/2018 07:14 PM, Andrew Morton wrote: > On Wed, 1 Aug 2018 17:56:46 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: > >> Hi, >> >> On 08/01/2018 04:50 PM, Andrew Morton wrote: >>> On Wed, 1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote: >>> >>>> The thread "avoid alloc memory on offline node" >>>> >>>> https://lkml.org/lkml/2018/6/7/251 >>>> >>>> Asked at one point why the kzalloc_node was crashing rather than >>>> returning memory from a valid node. The thread ended up fixing >>>> the immediate causes of the crash but left open the case of bad >>>> proximity values being in DSDT tables without corrisponding >>>> SRAT/SLIT entries as is happening on another machine. >>>> >>>> Its also easy to fix that, but we should also harden the allocator >>>> sufficiently that it doesn't crash when passed an invalid node id. >>>> There are a couple possible ways to do this, and i've attached two >>>> separate patches which individually fix that problem. >>>> >>>> The first detects the offline node before calling >>>> the new_slab code path when it becomes apparent that the allocation isn't >>>> going to succeed. The second actually hardens node_zonelist() and >>>> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL >>>> zonelist. This latter case happens if the node has never been initialized >>>> or is possibly out of range. There are other places (NODE_DATA & >>>> online_node) which should be checking if the node id's are > MAX_NUMNODES. >>>> >>> >>> What is it that leads to a caller requesting memory from an invalid >>> node? A race against offlining? If so then that's a lack of >>> appropriate locking, isn't it? >> >> There were a couple unrelated cases, both having to do with the PXN >> associated with a PCI port. The first case AFAIK, the domain wasn't >> really invalid if the entire SRAT was parsed and nodes created even when >> there weren't associated CPUs. The second case (a different machine) is >> simply a PXN value that is completely invalid (no associated >> SLIT/SRAT/etc entries) due to firmware making a mistake when a socket >> isn't populated. >> >> There have been a few other suggested or merged patches for the >> individual problems above, this set is just an attempt at avoiding a >> full crash if/when another similar problem happens. > > Please add the above info to the changelog. Sure. > >> >>> >>> I don't see a problem with emitting a warning and then selecting a >>> different node so we can keep running. But we do want that warning, so >>> we can understand the root cause and fix it? >> >> Yes, we do want to know when an invalid id is passed, i will add the >> VM_WARN in the first one. >> >> The second one I wasn't sure about as failing prepare_alloc_pages() >> generates a couple of error messages, but the system then continues >> operation. >> >> I guess my question though is which method (or both/something else?) is >> the preferred way to harden this up? > > The first patch looked neater. Can we get a WARN_ON in there as well? > Yes, Thanks, ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-08-03 6:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton 2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton 2018-08-02 9:15 ` Michal Hocko 2018-08-03 3:21 ` Jeremy Linton 2018-08-03 6:20 ` Michal Hocko 2018-08-02 14:23 ` Christopher Lameter 2018-08-03 3:12 ` Jeremy Linton 2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton 2018-08-02 7:31 ` Michal Hocko 2018-08-03 3:17 ` Jeremy Linton 2018-08-03 6:24 ` Michal Hocko 2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton 2018-08-01 22:56 ` Jeremy Linton 2018-08-02 0:14 ` Andrew Morton 2018-08-03 3:15 ` Jeremy Linton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).