* FIX [1/2] slub: Do not dereference NULL pointer in node_match [not found] <20130123214514.370647954@linux.com> @ 2013-01-23 21:45 ` Christoph Lameter 2013-01-24 0:53 ` Simon Jeons 2013-02-01 10:23 ` Pekka Enberg 2013-01-23 21:45 ` FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor Christoph Lameter 1 sibling, 2 replies; 9+ messages in thread From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw) To: Pekka Enberg Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia The variables accessed in slab_alloc are volatile and therefore the page pointer passed to node_match can be NULL. The processing of data in slab_alloc is tentative until either the cmpxhchg succeeds or the __slab_alloc slowpath is invoked. Both are able to perform the same allocation from the freelist. Check for the NULL pointer in node_match. A false positive will lead to a retry of the loop in __slab_alloc. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600 +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600 @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache static inline int node_match(struct page *page, int node) { #ifdef CONFIG_NUMA - if (node != NUMA_NO_NODE && page_to_nid(page) != node) + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node)) return 0; #endif return 1; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter @ 2013-01-24 0:53 ` Simon Jeons 2013-01-24 15:14 ` Christoph Lameter 2013-02-01 10:23 ` Pekka Enberg 1 sibling, 1 reply; 9+ messages in thread From: Simon Jeons @ 2013-01-24 0:53 UTC (permalink / raw) To: Christoph Lameter Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote: > The variables accessed in slab_alloc are volatile and therefore > the page pointer passed to node_match can be NULL. The processing > of data in slab_alloc is tentative until either the cmpxhchg > succeeds or the __slab_alloc slowpath is invoked. Both are > able to perform the same allocation from the freelist. > > Check for the NULL pointer in node_match. > > A false positive will lead to a retry of the loop in __slab_alloc. Hi Christoph, Since page_to_nid(NULL) will trigger bug, then how can run into __slab_alloc? > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600 > +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600 > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache > static inline int node_match(struct page *page, int node) > { > #ifdef CONFIG_NUMA > - if (node != NUMA_NO_NODE && page_to_nid(page) != node) > + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node)) > return 0; > #endif > return 1; > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-01-24 0:53 ` Simon Jeons @ 2013-01-24 15:14 ` Christoph Lameter 2013-01-25 8:11 ` Simon Jeons 0 siblings, 1 reply; 9+ messages in thread From: Christoph Lameter @ 2013-01-24 15:14 UTC (permalink / raw) To: Simon Jeons Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Wed, 23 Jan 2013, Simon Jeons wrote: > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote: > > The variables accessed in slab_alloc are volatile and therefore > > the page pointer passed to node_match can be NULL. The processing > > of data in slab_alloc is tentative until either the cmpxhchg > > succeeds or the __slab_alloc slowpath is invoked. Both are > > able to perform the same allocation from the freelist. > > > > Check for the NULL pointer in node_match. > > > > A false positive will lead to a retry of the loop in __slab_alloc. > > Hi Christoph, > > Since page_to_nid(NULL) will trigger bug, then how can run into > __slab_alloc? page = NULL -> node_match(NULL, xx) = 0 -> call into __slab_alloc. __slab_alloc() will check for !c->page which requires the assignment of a new per cpu slab page. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-01-24 15:14 ` Christoph Lameter @ 2013-01-25 8:11 ` Simon Jeons 2013-01-25 14:53 ` Christoph Lameter 0 siblings, 1 reply; 9+ messages in thread From: Simon Jeons @ 2013-01-25 8:11 UTC (permalink / raw) To: Christoph Lameter Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Thu, 2013-01-24 at 15:14 +0000, Christoph Lameter wrote: > On Wed, 23 Jan 2013, Simon Jeons wrote: > > > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote: > > > The variables accessed in slab_alloc are volatile and therefore > > > the page pointer passed to node_match can be NULL. The processing > > > of data in slab_alloc is tentative until either the cmpxhchg > > > succeeds or the __slab_alloc slowpath is invoked. Both are > > > able to perform the same allocation from the freelist. > > > > > > Check for the NULL pointer in node_match. > > > > > > A false positive will lead to a retry of the loop in __slab_alloc. > > > > Hi Christoph, > > > > Since page_to_nid(NULL) will trigger bug, then how can run into > > __slab_alloc? > > page = NULL > > -> > > node_match(NULL, xx) = 0 > > -> > > call into __slab_alloc. > > __slab_alloc() will check for !c->page which requires the assignment of a > new per cpu slab page. > But there are dereference in page_to_nid path, function page_to_section: return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-01-25 8:11 ` Simon Jeons @ 2013-01-25 14:53 ` Christoph Lameter 0 siblings, 0 replies; 9+ messages in thread From: Christoph Lameter @ 2013-01-25 14:53 UTC (permalink / raw) To: Simon Jeons Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Fri, 25 Jan 2013, Simon Jeons wrote: > > > > node_match(NULL, xx) = 0 > > > > -> > > > > call into __slab_alloc. > > > > __slab_alloc() will check for !c->page which requires the assignment of a > > new per cpu slab page. > > > > But there are dereference in page_to_nid path, function page_to_section: > return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; node_match() checks for NULL and will not invoke page_to_nid for a NULL pointer. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter 2013-01-24 0:53 ` Simon Jeons @ 2013-02-01 10:23 ` Pekka Enberg 2013-02-01 11:51 ` Steven Rostedt 1 sibling, 1 reply; 9+ messages in thread From: Pekka Enberg @ 2013-02-01 10:23 UTC (permalink / raw) To: Christoph Lameter Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote: > The variables accessed in slab_alloc are volatile and therefore > the page pointer passed to node_match can be NULL. The processing > of data in slab_alloc is tentative until either the cmpxhchg > succeeds or the __slab_alloc slowpath is invoked. Both are > able to perform the same allocation from the freelist. > > Check for the NULL pointer in node_match. > > A false positive will lead to a retry of the loop in __slab_alloc. > > Signed-off-by: Christoph Lameter <cl@linux.com> Steven, how did you trigger the problem - i.e. is this -rt only problem? Does the patch work for you? > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600 > +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600 > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache > static inline int node_match(struct page *page, int node) > { > #ifdef CONFIG_NUMA > - if (node != NUMA_NO_NODE && page_to_nid(page) != node) > + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node)) > return 0; > #endif > return 1; > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match 2013-02-01 10:23 ` Pekka Enberg @ 2013-02-01 11:51 ` Steven Rostedt 0 siblings, 0 replies; 9+ messages in thread From: Steven Rostedt @ 2013-02-01 11:51 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Lameter, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Fri, 2013-02-01 at 12:23 +0200, Pekka Enberg wrote: > On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote: > > The variables accessed in slab_alloc are volatile and therefore > > the page pointer passed to node_match can be NULL. The processing > > of data in slab_alloc is tentative until either the cmpxhchg > > succeeds or the __slab_alloc slowpath is invoked. Both are > > able to perform the same allocation from the freelist. > > > > Check for the NULL pointer in node_match. > > > > A false positive will lead to a retry of the loop in __slab_alloc. > > > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Steven, how did you trigger the problem - i.e. is this -rt only > problem? Does the patch work for you? I haven't tested Christoph's version yet. I've only tested my own. But I'll take his and run them through tests as well. This bug is not easy to hit. It is not a -rt only bug, and yes it probably should go to stable. The race is extremely small, but -rt creates scenarios that may only be hit by 1000 CPU core machines. Because of the preemptive nature of -rt, -rt is much more susceptible to race conditions than mainline. But these are real bugs for mainline too. It may only trigger once a year, where in -rt it will trigger once a week. -- Steve > > > Index: linux/mm/slub.c > > =================================================================== > > --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600 > > +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600 > > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache > > static inline int node_match(struct page *page, int node) > > { > > #ifdef CONFIG_NUMA > > - if (node != NUMA_NO_NODE && page_to_nid(page) != node) > > + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node)) > > return 0; > > #endif > > return 1; > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor. [not found] <20130123214514.370647954@linux.com> 2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter @ 2013-01-23 21:45 ` Christoph Lameter 2013-02-01 10:24 ` Pekka Enberg 1 sibling, 1 reply; 9+ messages in thread From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw) To: Pekka Enberg Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor after the determination of the per cpu pointer and before the tid is retrieved. This could result in allocation from the wrong node in slab_alloc. The effect is much more severe in slab_free() where we could free to the freelist of the wrong page. The window for something like that occurring is pretty small but it is possible. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/slub.c =================================================================== --- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600 +++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600 @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_ s = memcg_kmem_get_cache(s, gfpflags); redo: - /* * Must read kmem_cache cpu data via this cpu ptr. Preemption is * enabled. We may switch back and forth between cpus while * reading from one cpu area. That does not matter as long * as we end up on the original cpu again when doing the cmpxchg. + * + * Preemption is disabled for the retrieval of the tid because that + * must occur from the current processor. We cannot allow rescheduling + * on a different processor between the determination of the pointer + * and the retrieval of the tid. */ + preempt_disable(); c = __this_cpu_ptr(s->cpu_slab); /* @@ -2347,7 +2352,7 @@ redo: * linked list in between. */ tid = c->tid; - barrier(); + preempt_enable(); object = c->freelist; page = c->page; @@ -2594,10 +2599,11 @@ redo: * data is retrieved via this pointer. If we are on the same cpu * during the cmpxchg then the free will succedd. */ + preempt_disable(); c = __this_cpu_ptr(s->cpu_slab); tid = c->tid; - barrier(); + preempt_enable(); if (likely(page == c->page)) { set_freepointer(s, object, c->freelist); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor. 2013-01-23 21:45 ` FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor Christoph Lameter @ 2013-02-01 10:24 ` Pekka Enberg 0 siblings, 0 replies; 9+ messages in thread From: Pekka Enberg @ 2013-02-01 10:24 UTC (permalink / raw) To: Christoph Lameter Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes, elezegarcia On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote: > As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor > after the determination of the per cpu pointer and before the tid is retrieved. > This could result in allocation from the wrong node in slab_alloc. > > The effect is much more severe in slab_free() where we could free to the freelist > of the wrong page. > > The window for something like that occurring is pretty small but it is possible. > > Signed-off-by: Christoph Lameter <cl@linux.com> Okay, makes sense. Has anyone triggered this in practice? Do we want to tag this for -stable? > > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600 > +++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600 > @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_ > > s = memcg_kmem_get_cache(s, gfpflags); > redo: > - > /* > * Must read kmem_cache cpu data via this cpu ptr. Preemption is > * enabled. We may switch back and forth between cpus while > * reading from one cpu area. That does not matter as long > * as we end up on the original cpu again when doing the cmpxchg. > + * > + * Preemption is disabled for the retrieval of the tid because that > + * must occur from the current processor. We cannot allow rescheduling > + * on a different processor between the determination of the pointer > + * and the retrieval of the tid. > */ > + preempt_disable(); > c = __this_cpu_ptr(s->cpu_slab); > > /* > @@ -2347,7 +2352,7 @@ redo: > * linked list in between. > */ > tid = c->tid; > - barrier(); > + preempt_enable(); > > object = c->freelist; > page = c->page; > @@ -2594,10 +2599,11 @@ redo: > * data is retrieved via this pointer. If we are on the same cpu > * during the cmpxchg then the free will succedd. > */ > + preempt_disable(); > c = __this_cpu_ptr(s->cpu_slab); > > tid = c->tid; > - barrier(); > + preempt_enable(); > > if (likely(page == c->page)) { > set_freepointer(s, object, c->freelist); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-02-01 11:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20130123214514.370647954@linux.com> 2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter 2013-01-24 0:53 ` Simon Jeons 2013-01-24 15:14 ` Christoph Lameter 2013-01-25 8:11 ` Simon Jeons 2013-01-25 14:53 ` Christoph Lameter 2013-02-01 10:23 ` Pekka Enberg 2013-02-01 11:51 ` Steven Rostedt 2013-01-23 21:45 ` FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor Christoph Lameter 2013-02-01 10:24 ` Pekka Enberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).