[RFC][PATCH] slub: Check for page NULL before doing the node

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] slub: Check for page NULL before doing the node_match check
@ 2013-01-17 18:10 Steven Rostedt
  2013-01-17 18:37 ` Steven Rostedt
  2013-01-17 21:22 ` Christoph Lameter
  0 siblings, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 18:10 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

In the -rt kernel (mrg), we hit the following dump:

BUG: unable to handle kernel NULL pointer dereference at           (null) 
IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 
PGD a2d39067 PUD b1641067 PMD 0  
Oops: 0000 [#1] PREEMPT SMP  
Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod 
CPU 3  
Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992 
RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 
RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213 
RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000 
RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500 
RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd 
R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500 
R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000 
FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000) 
Stack: 
 ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011 
 0000000001200011 0000000001200011 0000000000000000 0000000000000000 
 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd 
Call Trace: 
 [<ffffffff81202e08>] ? current_has_perm+0x68/0x80 
 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0 
 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30 
 [<ffffffff8104369a>] do_fork+0x5a/0x360 
 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220 
 [<ffffffff8100b068>] sys_clone+0x28/0x30 
 [<ffffffff81527423>] stub_clone+0x13/0x20 
 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b 
Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2  
RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 
 RSP <ffff8800a9b17d70> 
CR2: 0000000000000000 
---[ end trace 0000000000000002 ]--- 

Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
disable migration. But the SLUB code is relatively lockless, and the
spin_locks there are raw_spin_locks (not converted to mutexes), thus I
believe this bug can happen in mainline without -rt features. The -rt
patch is just good at triggering mainline bugs ;-)

Anyway, looking at where this crashed, it seems that the page variable
can be NULL when passed to the node_match() function (which does not
check if it is NULL). When this happens we get the above panic.

As page is only used in slab_alloc() to check if the node matches, if
it's NULL I'm assuming that we can say it doesn't and call the
__slab_alloc() code. Is this a correct assumption?

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/mm/slub.c b/mm/slub.c
index 2d9511a..85b95d5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2399,7 +2399,7 @@ redo:
 
 	object = c->freelist;
 	page = c->page;
-	if (unlikely(!object || !node_match(page, node)))
+	if (unlikely(!object || !page || !node_match(page, node)))
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 18:10 [RFC][PATCH] slub: Check for page NULL before doing the node_match check Steven Rostedt
@ 2013-01-17 18:37 ` Steven Rostedt
  2013-01-17 21:28   ` Christoph Lameter
  2013-01-17 21:22 ` Christoph Lameter
  1 sibling, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 18:37 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Andrew Morton, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 13:10 -0500, Steven Rostedt wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index 2d9511a..85b95d5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2399,7 +2399,7 @@ redo:
>  
>  	object = c->freelist;
>  	page = c->page;
> -	if (unlikely(!object || !node_match(page, node)))
> +	if (unlikely(!object || !page || !node_match(page, node)))

I'm still trying to see if c->freelist != NULL and c->page == NULL isn't
a bug. The cmpxchg_doubles are a little confusing. If it's not expected
that page is NULL but freelist isn't than we need to figure out why it
happened.

-- Steve

>  		object = __slab_alloc(s, gfpflags, node, addr, c);
>  
>  	else {
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 18:37 ` Steven Rostedt
@ 2013-01-17 21:28   ` Christoph Lameter
  2013-01-17 21:36     ` Eric Dumazet
  2013-01-17 21:43     ` Steven Rostedt
  0 siblings, 2 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-17 21:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 17 Jan 2013, Steven Rostedt wrote:

> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2399,7 +2399,7 @@ redo:
> >
> >  	object = c->freelist;
> >  	page = c->page;
> > -	if (unlikely(!object || !node_match(page, node)))
> > +	if (unlikely(!object || !page || !node_match(page, node)))
>
> I'm still trying to see if c->freelist != NULL and c->page == NULL isn't
> a bug. The cmpxchg_doubles are a little confusing. If it's not expected
> that page is NULL but freelist isn't than we need to figure out why it
> happened.

hmmm.. We may want to change the sequence of updates to c->page and
c->freelist. Update c->freelist to be NULL first so that we always enter
the slow path for these cases where we can do more expensive
synchronization.

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-15 10:42:08.490183607 -0600
+++ linux/mm/slub.c	2013-01-17 15:27:48.973051155 -0600
@@ -1993,8 +1993,8 @@ static inline void flush_slab(struct kme
 	deactivate_slab(s, c->page, c->freelist);

 	c->tid = next_tid(c->tid);
-	c->page = NULL;
 	c->freelist = NULL;
+	c->page = NULL;
 }

 /*
@@ -2227,8 +2227,8 @@ redo:
 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
 		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
 		c->freelist = NULL;
+		c->page = NULL;
 		goto new_slab;
 	}

@@ -2239,8 +2239,8 @@ redo:
 	 */
 	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
 		deactivate_slab(s, page, c->freelist);
-		c->page = NULL;
 		c->freelist = NULL;
+		c->page = NULL;
 		goto new_slab;
 	}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:28   ` Christoph Lameter
@ 2013-01-17 21:36     ` Eric Dumazet
  2013-01-17 21:44       ` Steven Rostedt
  2013-01-17 21:43     ` Steven Rostedt
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2013-01-17 21:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 21:28 +0000, Christoph Lameter wrote:
> On Thu, 17 Jan 2013, Steven Rostedt wrote:
> 
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2399,7 +2399,7 @@ redo:
> > >
> > >  	object = c->freelist;
> > >  	page = c->page;
> > > -	if (unlikely(!object || !node_match(page, node)))
> > > +	if (unlikely(!object || !page || !node_match(page, node)))
> >
> > I'm still trying to see if c->freelist != NULL and c->page == NULL isn't
> > a bug. The cmpxchg_doubles are a little confusing. If it's not expected
> > that page is NULL but freelist isn't than we need to figure out why it
> > happened.
> 
> hmmm.. We may want to change the sequence of updates to c->page and
> c->freelist. Update c->freelist to be NULL first so that we always enter
> the slow path for these cases where we can do more expensive
> synchronization.
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-15 10:42:08.490183607 -0600
> +++ linux/mm/slub.c	2013-01-17 15:27:48.973051155 -0600
> @@ -1993,8 +1993,8 @@ static inline void flush_slab(struct kme
>  	deactivate_slab(s, c->page, c->freelist);
> 
>  	c->tid = next_tid(c->tid);
> -	c->page = NULL;
>  	c->freelist = NULL;
> +	c->page = NULL;
>  }
> 
>  /*
> @@ -2227,8 +2227,8 @@ redo:
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
>  		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
>  		c->freelist = NULL;
> +		c->page = NULL;
>  		goto new_slab;
>  	}
> 
> @@ -2239,8 +2239,8 @@ redo:
>  	 */
>  	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
>  		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
>  		c->freelist = NULL;
> +		c->page = NULL;
>  		goto new_slab;
>  	}
> 

Without appropriate barriers, this change is a shoot in the dark.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:36     ` Eric Dumazet
@ 2013-01-17 21:44       ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 21:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 13:36 -0800, Eric Dumazet wrote:

> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c	2013-01-15 10:42:08.490183607 -0600
> > +++ linux/mm/slub.c	2013-01-17 15:27:48.973051155 -0600
> > @@ -1993,8 +1993,8 @@ static inline void flush_slab(struct kme
> >  	deactivate_slab(s, c->page, c->freelist);
> > 
> >  	c->tid = next_tid(c->tid);
> > -	c->page = NULL;
> >  	c->freelist = NULL;
> > +	c->page = NULL;
> >  }
> > 
> >  /*
> > @@ -2227,8 +2227,8 @@ redo:
> >  	if (unlikely(!node_match(page, node))) {
> >  		stat(s, ALLOC_NODE_MISMATCH);
> >  		deactivate_slab(s, page, c->freelist);
> > -		c->page = NULL;
> >  		c->freelist = NULL;
> > +		c->page = NULL;
> >  		goto new_slab;
> >  	}
> > 
> > @@ -2239,8 +2239,8 @@ redo:
> >  	 */
> >  	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
> >  		deactivate_slab(s, page, c->freelist);
> > -		c->page = NULL;
> >  		c->freelist = NULL;
> > +		c->page = NULL;
> >  		goto new_slab;
> >  	}
> > 
> 
> Without appropriate barriers, this change is a shoot in the dark.

Totally agree.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:28   ` Christoph Lameter
  2013-01-17 21:36     ` Eric Dumazet
@ 2013-01-17 21:43     ` Steven Rostedt
  2013-01-17 21:51       ` Christoph Lameter
       [not found]       ` <alpine.DEB.2.02.1301171547370.2774@gentwo.org>
  1 sibling, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 21:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 21:28 +0000, Christoph Lameter wrote:
> On Thu, 17 Jan 2013, Steven Rostedt wrote:
> 
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2399,7 +2399,7 @@ redo:
> > >
> > >  	object = c->freelist;
> > >  	page = c->page;

We should add a BUG_ON(!page) if it's a problem. I wasted a bit of time
finding this bug just because it triggered in a static inline function,
and I didn't have the vmlinuz file to play with. I had to ask someone
else to do the work for me.


> > > -	if (unlikely(!object || !node_match(page, node)))
> > > +	if (unlikely(!object || !page || !node_match(page, node)))
> >
> > I'm still trying to see if c->freelist != NULL and c->page == NULL isn't
> > a bug. The cmpxchg_doubles are a little confusing. If it's not expected
> > that page is NULL but freelist isn't than we need to figure out why it
> > happened.
> 
> hmmm.. We may want to change the sequence of updates to c->page and
> c->freelist. Update c->freelist to be NULL first so that we always enter
> the slow path for these cases where we can do more expensive
> synchronization.
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-15 10:42:08.490183607 -0600
> +++ linux/mm/slub.c	2013-01-17 15:27:48.973051155 -0600
> @@ -1993,8 +1993,8 @@ static inline void flush_slab(struct kme
>  	deactivate_slab(s, c->page, c->freelist);
> 
>  	c->tid = next_tid(c->tid);
> -	c->page = NULL;
>  	c->freelist = NULL;
> +	c->page = NULL;

I'm assuming that this is to deal with the same CPU being able to touch
the code?

If so, it requires "barrier()". If this can affect other CPUs, then we
need a smp_wmb() here, and smp_rmb() where it matters.

-- Steve

>  }
> 
>  /*
> @@ -2227,8 +2227,8 @@ redo:
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
>  		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
>  		c->freelist = NULL;
> +		c->page = NULL;
>  		goto new_slab;
>  	}
> 
> @@ -2239,8 +2239,8 @@ redo:
>  	 */
>  	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
>  		deactivate_slab(s, page, c->freelist);
> -		c->page = NULL;
>  		c->freelist = NULL;
> +		c->page = NULL;
>  		goto new_slab;
>  	}
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:43     ` Steven Rostedt
@ 2013-01-17 21:51       ` Christoph Lameter
  2013-01-17 22:07         ` Steven Rostedt
  2013-01-17 22:46         ` Steven Rostedt
       [not found]       ` <alpine.DEB.2.02.1301171547370.2774@gentwo.org>
  1 sibling, 2 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-17 21:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 17 Jan 2013, Steven Rostedt wrote:

> >  	c->tid = next_tid(c->tid);
> > -	c->page = NULL;
> >  	c->freelist = NULL;
> > +	c->page = NULL;
>
> I'm assuming that this is to deal with the same CPU being able to touch
> the code?
>
> If so, it requires "barrier()". If this can affect other CPUs, then we
> need a smp_wmb() here, and smp_rmb() where it matters.

This is dealing with the same cpu being interrupted. Some of these
segments are in interrupt disable sections so they are not affected.

The above is a section where interrupts are enabled so it needs the
barriers.

> > @@ -2227,8 +2227,8 @@ redo:
> >  	if (unlikely(!node_match(page, node))) {
> >  		stat(s, ALLOC_NODE_MISMATCH);
> >  		deactivate_slab(s, page, c->freelist);
> > -		c->page = NULL;
> >  		c->freelist = NULL;
> > +		c->page = NULL;
> >  		goto new_slab;
> >  	}
> >

Interrupts are disabled so we do not need to change anything here.


> > @@ -2239,8 +2239,8 @@ redo:
> >  	 */
> >  	if (unlikely(!pfmemalloc_match(page, gfpflags))) {
> >  		deactivate_slab(s, page, c->freelist);
> > -		c->page = NULL;
> >  		c->freelist = NULL;
> > +		c->page = NULL;
> >  		goto new_slab;
> >  	}
> >

Ditto which leaves us with:

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-17 15:49:57.417491975 -0600
+++ linux/mm/slub.c	2013-01-17 15:50:49.010287150 -0600
@@ -1993,8 +1993,9 @@ static inline void flush_slab(struct kme
 	deactivate_slab(s, c->page, c->freelist);

 	c->tid = next_tid(c->tid);
-	c->page = NULL;
 	c->freelist = NULL;
+	barrier();
+	c->page = NULL;
 }

 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:51       ` Christoph Lameter
@ 2013-01-17 22:07         ` Steven Rostedt
  2013-01-17 22:46         ` Steven Rostedt
  1 sibling, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 22:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 21:51 +0000, Christoph Lameter wrote:

> Ditto which leaves us with:

Except...

> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-17 15:49:57.417491975 -0600
> +++ linux/mm/slub.c	2013-01-17 15:50:49.010287150 -0600
> @@ -1993,8 +1993,9 @@ static inline void flush_slab(struct kme
>  	deactivate_slab(s, c->page, c->freelist);
> 
>  	c->tid = next_tid(c->tid);
> -	c->page = NULL;
>  	c->freelist = NULL;

+	/*
+	 * Preemption may be enabled here, and if the task is preempted
+	 * other tasks require that freelist is NULL if page is NULL.
+	 */

> +	barrier();
> +	c->page = NULL;
>  }
> 
>  /*

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:51       ` Christoph Lameter
  2013-01-17 22:07         ` Steven Rostedt
@ 2013-01-17 22:46         ` Steven Rostedt
  2013-01-17 23:10           ` Steven Rostedt
  1 sibling, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 22:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 21:51 +0000, Christoph Lameter wrote:

> This is dealing with the same cpu being interrupted. Some of these
> segments are in interrupt disable sections so they are not affected.

Except that we are not always on the same CPU. Now I'm looking at
mainline (non modified by -rt):

>From slab_alloc_node():

	/*
	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
	 * enabled. We may switch back and forth between cpus while
	 * reading from one cpu area. That does not matter as long
	 * as we end up on the original cpu again when doing the cmpxchg.
	 */
	c = __this_cpu_ptr(s->cpu_slab);

	/*
	 * The transaction ids are globally unique per cpu and per operation on
	 * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
	 * occurs on the right processor and that there was no operation on the
	 * linked list in between.
	 */
	tid = c->tid;
	barrier();

	object = c->freelist;
	page = c->page;
	if (unlikely(!object || !node_match(page, node)))
		object = __slab_alloc(s, gfpflags, node, addr, c);

Where we hit the bug on -rt, and can most certainly do it on mainline.

This code does not disable preemption (the comment even states that). So
if we switch CPUs after reading __this_cpu_ptr(), we are still accessing
the 'c' pointer of the CPU we left. Hence, there's nothing protecting
c->page being NULL when c->freelist is not NULL.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 22:46         ` Steven Rostedt
@ 2013-01-17 23:10           ` Steven Rostedt
  2013-01-17 23:20             ` [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node() Steven Rostedt
  2013-01-18 14:43             ` [RFC][PATCH] slub: Check for page NULL before doing the node_match check Christoph Lameter
  0 siblings, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 23:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 17:46 -0500, Steven Rostedt wrote:
> On Thu, 2013-01-17 at 21:51 +0000, Christoph Lameter wrote:
> 
> > This is dealing with the same cpu being interrupted. Some of these
> > segments are in interrupt disable sections so they are not affected.
> 
> Except that we are not always on the same CPU. Now I'm looking at
> mainline (non modified by -rt):

Because there's also nothing to keep page related to object either, we
may just need to do:


> 
> From slab_alloc_node():
> 
> 	/*
> 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> 	 * enabled. We may switch back and forth between cpus while
> 	 * reading from one cpu area. That does not matter as long
> 	 * as we end up on the original cpu again when doing the cmpxchg.
> 	 */

	local_irq_save(flags);

> 	c = __this_cpu_ptr(s->cpu_slab);
> 
> 	/*
> 	 * The transaction ids are globally unique per cpu and per operation on
> 	 * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
> 	 * occurs on the right processor and that there was no operation on the
> 	 * linked list in between.
> 	 */
> 	tid = c->tid;
> 	barrier();
> 
> 	object = c->freelist;
> 	page = c->page;

	r = !object || !node_match(page, node);
	local_irq_restore(flags);

	if (unlikely(r)) {

> 		object = __slab_alloc(s, gfpflags, node, addr, c);
> 

I was thinking at first we could use preempt_disable(), but if an
interrupt comes in after we set object = c->freelist, and changes
c->page, then we disassociate the freelist and page again.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-17 23:10           ` Steven Rostedt
@ 2013-01-17 23:20             ` Steven Rostedt
  2013-01-18  0:23               ` Steven Rostedt
  2013-01-18 14:43             ` [RFC][PATCH] slub: Check for page NULL before doing the node_match check Christoph Lameter
  1 sibling, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 23:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

In slab_alloc_node(), after the cpu_slab is assigned, if the task is
preempted and moves to another CPU, there's nothing keeping the page and
object in sync. The -rt kernel crashed because page was NULL and object
was not, and the node_match() dereferences page. Even though the crash
happened on -rt, there's nothing that's keeping this from happening on
mainline.

The easiest fix is to disable interrupts for the entire time from
acquiring the current CPU cpu_slab and assigning the object and page.
After that, it's fine to allow preemption.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/mm/slub.c b/mm/slub.c
index 2fdd96f..c00ce80 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2317,6 +2317,8 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	struct page *page;
 	unsigned long tid;
+	unsigned long flags;
+	int new_object;
 
 	if (slab_pre_alloc_hook(s, gfpflags))
 		return NULL;
@@ -2328,7 +2330,10 @@ redo:
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * But we need to sync the setting of page and object.
 	 */
+	local_irq_save(flags);
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2338,11 +2343,14 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
 
 	object = c->freelist;
 	page = c->page;
-	if (unlikely(!object || !node_match(page, node)))
+
+	new_object = !object || !node_match(page, node);
+	local_irq_restore(flags);
+
+	if (new_object)
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 
 	else {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-17 23:20             ` [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node() Steven Rostedt
@ 2013-01-18  0:23               ` Steven Rostedt
  2013-01-18  0:28                 ` [RFC][PATCH v2] " Steven Rostedt
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18  0:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 18:20 -0500, Steven Rostedt wrote:
> 	object = c->freelist;
>  	page = c->page;

Hmm, having local_irq_restore() here is probably just as good, as object
and page were grabbed together under it. It doesn't change the condition
below any.

/me updates patch.

-- Steve

> -	if (unlikely(!object || !node_match(page, node)))
> +
> +	new_object = !object || !node_match(page, node);
> +	local_irq_restore(flags);
> +
> +	if (new_object)
>  		object = __slab_alloc(s, gfpflags, node, addr, c);
>  
>  	else {
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  0:23               ` Steven Rostedt
@ 2013-01-18  0:28                 ` Steven Rostedt
  2013-01-18  4:42                   ` Joonsoo Kim
                                     ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18  0:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

In slab_alloc_node(), after the cpu_slab is assigned, if the task is
preempted and moves to another CPU, there's nothing keeping the page and
object in sync. The -rt kernel crashed because page was NULL and object
was not, and the node_match() dereferences page. Even though the crash
happened on -rt, there's nothing that's keeping this from happening on
mainline.

The easiest fix is to disable interrupts for the entire time from
acquiring the current CPU cpu_slab and assigning the object and page.
After that, it's fine to allow preemption.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/mm/slub.c b/mm/slub.c
index ba2ca53..f0681db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2325,6 +2325,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	struct page *page;
 	unsigned long tid;
+	unsigned long flags;
 
 	if (slab_pre_alloc_hook(s, gfpflags))
 		return NULL;
@@ -2337,7 +2338,10 @@ redo:
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * But we need to sync the setting of page and object.
 	 */
+	local_irq_save(flags);
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2347,10 +2351,11 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
 
 	object = c->freelist;
 	page = c->page;
+	local_irq_restore(flags);
+
 	if (unlikely(!object || !node_match(page, node)))
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  0:28                 ` [RFC][PATCH v2] " Steven Rostedt
@ 2013-01-18  4:42                   ` Joonsoo Kim
  2013-01-18 14:52                     ` Christoph Lameter
  2013-01-18 15:29                     ` Steven Rostedt
  2013-01-18 14:44                   ` Christoph Lameter
  2013-01-18 15:09                   ` [RFC][PATCH v3] " Steven Rostedt
  2 siblings, 2 replies; 34+ messages in thread
From: Joonsoo Kim @ 2013-01-18  4:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

Hello, Steven.

On Thu, Jan 17, 2013 at 07:28:44PM -0500, Steven Rostedt wrote:
> In slab_alloc_node(), after the cpu_slab is assigned, if the task is
> preempted and moves to another CPU, there's nothing keeping the page and
> object in sync. The -rt kernel crashed because page was NULL and object
> was not, and the node_match() dereferences page. Even though the crash
> happened on -rt, there's nothing that's keeping this from happening on
> mainline.
> 
> The easiest fix is to disable interrupts for the entire time from
> acquiring the current CPU cpu_slab and assigning the object and page.
> After that, it's fine to allow preemption.

How about this?

It's based on v3.8-rc3.
I'm not test this patch, yet.
Just for sharing my idea to fix a problem.

-----------------8<-----------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  4:42                   ` Joonsoo Kim
@ 2013-01-18 14:52                     ` Christoph Lameter
  2013-01-18 15:29                     ` Steven Rostedt
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 14:52 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Steven Rostedt, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Joonsoo Kim wrote:

> How about this?

The easiest fix is to just check for NULL in node_match since a similar
situation can occur in slab_free.

Subject: slub: Do not dereference NULL pointer in node_match

The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in slab_alloc().

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-18 08:47:29.198954250 -0600
+++ linux/mm/slub.c	2013-01-18 08:47:40.579126371 -0600
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
 static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+	if (page && (node != NUMA_NO_NODE && page_to_nid(page) != node))
 		return 0;
 #endif
 	return 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  4:42                   ` Joonsoo Kim
  2013-01-18 14:52                     ` Christoph Lameter
@ 2013-01-18 15:29                     ` Steven Rostedt
  1 sibling, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 15:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 13:42 +0900, Joonsoo Kim wrote:

> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 9db4825..b54dffa 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -46,6 +46,9 @@ enum stat_item {
>  struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
> +#ifdef CONFIG_NUMA
> +	int node;

Note, you put an int between a long and a pointer, which will waste 4
bytes on 64bit machines.

> +#endif
>  	struct page *page;	/* The slab from which we are allocating */
>  	struct page *partial;	/* Partially allocated frozen slabs */
>  #ifdef CONFIG_SLUB_STATS



> @@ -2038,10 +2049,10 @@ static void flush_all(struct kmem_cache *s)
>   * Check if the objects in a per cpu structure fit numa
>   * locality expectations.
>   */
> -static inline int node_match(struct page *page, int node)
> +static inline int node_match(struct kmem_cache_cpu *c, int node)
>  {
>  #ifdef CONFIG_NUMA
> -	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> +	if (node != NUMA_NO_NODE && c->node != node)

We still have the issue of cpu fetching c->node before c->tid and
c->freelist.

I still believe the only solution is to prevent the task from migrating
via a preempt disable.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  0:28                 ` [RFC][PATCH v2] " Steven Rostedt
  2013-01-18  4:42                   ` Joonsoo Kim
@ 2013-01-18 14:44                   ` Christoph Lameter
  2013-01-18 15:04                     ` Steven Rostedt
  2013-01-18 15:09                   ` [RFC][PATCH v3] " Steven Rostedt
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 14:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 17 Jan 2013, Steven Rostedt wrote:

> In slab_alloc_node(), after the cpu_slab is assigned, if the task is
> preempted and moves to another CPU, there's nothing keeping the page and
> object in sync. The -rt kernel crashed because page was NULL and object
> was not, and the node_match() dereferences page. Even though the crash
> happened on -rt, there's nothing that's keeping this from happening on
> mainline.
>
> The easiest fix is to disable interrupts for the entire time from
> acquiring the current CPU cpu_slab and assigning the object and page.
> After that, it's fine to allow preemption.

Its easiest to just check for the NULL pointer as initally done. The call
to __slab_alloc can do what the fastpath does.

And the fastpath will verify that the c->page pointer was not changed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 14:44                   ` Christoph Lameter
@ 2013-01-18 15:04                     ` Steven Rostedt
  2013-01-18 15:55                       ` Steven Rostedt
  2013-01-18 18:23                       ` Christoph Lameter
  0 siblings, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 15:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 14:44 +0000, Christoph Lameter wrote:
> On Thu, 17 Jan 2013, Steven Rostedt wrote:
> 
> > In slab_alloc_node(), after the cpu_slab is assigned, if the task is
> > preempted and moves to another CPU, there's nothing keeping the page and
> > object in sync. The -rt kernel crashed because page was NULL and object
> > was not, and the node_match() dereferences page. Even though the crash
> > happened on -rt, there's nothing that's keeping this from happening on
> > mainline.
> >
> > The easiest fix is to disable interrupts for the entire time from
> > acquiring the current CPU cpu_slab and assigning the object and page.
> > After that, it's fine to allow preemption.
> 
> Its easiest to just check for the NULL pointer as initally done. The call
> to __slab_alloc can do what the fastpath does.
> 
> And the fastpath will verify that the c->page pointer was not changed.

The problem is that the changes can happen on another CPU, which means
that barrier isn't sufficient.

	CPU0			CPU1
	----			----
<cpu fetches c->page>
			updates c->tid
			updates c->page
			updates c->freelist
<cpu fetches c->tid>
<cpu fetches c->freelist>

  node_match() succeeds even though
    current c->page wont

 this_cpu_cmpxchg_double() only tests
   the object (freelist) and tid, both which
   will match, but the page that was tested
   isn't the right one.


That barrier() is meaningless as soon as another CPU is involved. The
CPU can order things anyway it wants, even if the assembly did in
differently. Due to cacheline misses and such, we have no idea if
c->page has been prefetched into memory or not.

We may get by with just disabling preemption and testing for page ==
NULL (just in case an interrupt comes in between objects and page and
resets that). But we can't grab freelist and page if c points to another
CPUs object.

-- Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 15:04                     ` Steven Rostedt
@ 2013-01-18 15:55                       ` Steven Rostedt
  2013-01-18 18:29                         ` Christoph Lameter
  2013-01-21  8:11                         ` Joonsoo Kim
  2013-01-18 18:23                       ` Christoph Lameter
  1 sibling, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 15:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 10:04 -0500, Steven Rostedt wrote:

Just to be more complete:

> 	CPU0			CPU1
> 	----			----
			c = __this_cpu_ptr(s->cpu_slab);
			<migrates to CPU0>

> <cpu fetches c->page>
			<another task>

> 			updates c->tid
> 			updates c->page
> 			updates c->freelist
> <cpu fetches c->tid>
> <cpu fetches c->freelist>
> 
>   node_match() succeeds even though
>     current c->page wont
> 

 <migrates back to CPU 1>

>  this_cpu_cmpxchg_double() only tests
>    the object (freelist) and tid, both which
>    will match, but the page that was tested
>    isn't the right one.
> 

Yes, it's very unlikely, but we are in the business of dealing with the
very unlikely. That's because in our business, the very unlikely is very
likely. Damn, I need to buy a lotto ticket!

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 15:55                       ` Steven Rostedt
@ 2013-01-18 18:29                         ` Christoph Lameter
  2013-01-18 18:52                           ` Steven Rostedt
  2013-01-21  8:11                         ` Joonsoo Kim
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 18:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Steven Rostedt wrote:

> On Fri, 2013-01-18 at 10:04 -0500, Steven Rostedt wrote:
>
> Just to be more complete:
>
> > 	CPU0			CPU1
> > 	----			----
> 			c = __this_cpu_ptr(s->cpu_slab);
> 			<migrates to CPU0>
>
> > <cpu fetches c->page>
> 			<another task>
>
> > 			updates c->tid

We can avoid the above scenario by doing a cpu local fetch.

i.e.
`	tid = this_cpu_read(s->cpu_slab->tid);


> > 			updates c->page
> > 			updates c->freelist
> > <cpu fetches c->tid>
> > <cpu fetches c->freelist>
> >
> >   node_match() succeeds even though
> >     current c->page wont
> >
>
>  <migrates back to CPU 1>
>
> >  this_cpu_cmpxchg_double() only tests
> >    the object (freelist) and tid, both which
> >    will match, but the page that was tested
> >    isn't the right one.
> >
>
> Yes, it's very unlikely, but we are in the business of dealing with the
> very unlikely. That's because in our business, the very unlikely is very
> likely. Damn, I need to buy a lotto ticket!

Well, the consequence would be that an object from another node than
desired will be allocated. Not that severe of an issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 18:29                         ` Christoph Lameter
@ 2013-01-18 18:52                           ` Steven Rostedt
  2013-01-21  1:48                             ` Christoph Lameter
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 18:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 18:29 +0000, Christoph Lameter wrote:
> On Fri, 18 Jan 2013, Steven Rostedt wrote:
> 
> > On Fri, 2013-01-18 at 10:04 -0500, Steven Rostedt wrote:
> >
> > Just to be more complete:
> >
> > > 	CPU0			CPU1
> > > 	----			----
> > 			c = __this_cpu_ptr(s->cpu_slab);
> > 			<migrates to CPU0>
> >
> > > <cpu fetches c->page>
> > 			<another task>
> >
> > > 			updates c->tid
> 
> We can avoid the above scenario by doing a cpu local fetch.
> 
> i.e.
> `	tid = this_cpu_read(s->cpu_slab->tid);

I'm curious to why not just add the preempt disable? It's rather quick
and avoids all this complex trickery, which is just prone to bugs. It
would make it much easier for others to review as well, and also keeps
the setting of page, objects and cpu_slab consistent with everything
else (which is assigned under preempt(irq)_disable).


> 
> 
> > > 			updates c->page
> > > 			updates c->freelist
> > > <cpu fetches c->tid>
> > > <cpu fetches c->freelist>
> > >
> > >   node_match() succeeds even though
> > >     current c->page wont
> > >
> >
> >  <migrates back to CPU 1>
> >
> > >  this_cpu_cmpxchg_double() only tests
> > >    the object (freelist) and tid, both which
> > >    will match, but the page that was tested
> > >    isn't the right one.
> > >
> >
> > Yes, it's very unlikely, but we are in the business of dealing with the
> > very unlikely. That's because in our business, the very unlikely is very
> > likely. Damn, I need to buy a lotto ticket!
> 
> Well, the consequence would be that an object from another node than
> desired will be allocated. Not that severe of an issue.

Yes, it's not that severe of an issue, but it is still incorrect code.
Why not just allocate on whatever node you want then? Why bother with
the check at all?

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 18:52                           ` Steven Rostedt
@ 2013-01-21  1:48                             ` Christoph Lameter
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-21  1:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Steven Rostedt wrote:

> I'm curious to why not just add the preempt disable? It's rather quick
> and avoids all this complex trickery, which is just prone to bugs. It
> would make it much easier for others to review as well, and also keeps
> the setting of page, objects and cpu_slab consistent with everything
> else (which is assigned under preempt(irq)_disable).

Because this_cpu_read does not need the code to do a preempt disable on
x86 and on any other arch that will support this_cpu_read. this_cpu_read()
is implementable on many platform with a register  / offset in the same
way as on x86.

> > Well, the consequence would be that an object from another node than
> > desired will be allocated. Not that severe of an issue.
>
> Yes, it's not that severe of an issue, but it is still incorrect code.
> Why not just allocate on whatever node you want then? Why bother with
> the check at all?

The check so far has worked correctly for all tests.
Just because a rare race condition has been detected that may cause an
incorrect allocation does not mean that the check has no purpose at all.
And of course it needs to be fixed.

My patch with the check for page = NULL is enough to fix the potential
NULL pointer deref (which also is another case of a rare race that has
survived lots of tests so far).

The other issue with the wrong node needs some more thought and some tests
on the impact on the instruction overhead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 15:55                       ` Steven Rostedt
  2013-01-18 18:29                         ` Christoph Lameter
@ 2013-01-21  8:11                         ` Joonsoo Kim
  2013-01-21 12:19                           ` Steven Rostedt
  1 sibling, 1 reply; 34+ messages in thread
From: Joonsoo Kim @ 2013-01-21  8:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, Jan 18, 2013 at 10:55:01AM -0500, Steven Rostedt wrote:
> On Fri, 2013-01-18 at 10:04 -0500, Steven Rostedt wrote:
> 
> Just to be more complete:
> 
> > 	CPU0			CPU1
> > 	----			----
> 			c = __this_cpu_ptr(s->cpu_slab);
> 			<migrates to CPU0>
> 
> > <cpu fetches c->page>
> 			<another task>
> 
> > 			updates c->tid
> > 			updates c->page
> > 			updates c->freelist
> > <cpu fetches c->tid>
> > <cpu fetches c->freelist>
> > 
> >   node_match() succeeds even though
> >     current c->page wont
> > 
> 
>  <migrates back to CPU 1>

Hello.
I have one stupid question just for curiosity.
Does the processor re-order instructions which load data from same cacheline?

> >  this_cpu_cmpxchg_double() only tests
> >    the object (freelist) and tid, both which
> >    will match, but the page that was tested
> >    isn't the right one.
> > 
> 
> Yes, it's very unlikely, but we are in the business of dealing with the
> very unlikely. That's because in our business, the very unlikely is very
> likely. Damn, I need to buy a lotto ticket!
> 
> -- Steve
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-21  8:11                         ` Joonsoo Kim
@ 2013-01-21 12:19                           ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-21 12:19 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Mon, 2013-01-21 at 17:11 +0900, Joonsoo Kim wrote:

> Hello.
> I have one stupid question just for curiosity.
> Does the processor re-order instructions which load data from same cacheline?
> 

Probably not, but I wouldn't have generic code assuming cacheline sizes.

-- Steve



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v2] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 15:04                     ` Steven Rostedt
  2013-01-18 15:55                       ` Steven Rostedt
@ 2013-01-18 18:23                       ` Christoph Lameter
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 18:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Steven Rostedt wrote:

> The problem is that the changes can happen on another CPU, which means
> that barrier isn't sufficient.

No these are per cpu variables that are only read locally. The
synchronization is not between processors.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH v3] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18  0:28                 ` [RFC][PATCH v2] " Steven Rostedt
  2013-01-18  4:42                   ` Joonsoo Kim
  2013-01-18 14:44                   ` Christoph Lameter
@ 2013-01-18 15:09                   ` Steven Rostedt
  2013-01-18 18:40                     ` Christoph Lameter
  2 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 15:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

In slab_alloc_node(), after the cpu_slab is assigned, if the task is
preempted and moves to another CPU, there's nothing keeping the page and
object in sync. The -rt kernel crashed because page was NULL and object
was not, and the node_match() dereferences page. Even though the crash
happened on -rt, there's nothing that's keeping this from happening on
mainline.

The easiest fix is to disable preemption for the entire time from
acquiring the current CPU cpu_slab and assigning the object and page.
After that, it's fine to allow preemption.

Also add a check if page is NULL in node_match().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/mm/slub.c b/mm/slub.c
index ba2ca53..10714ee 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache *s)
 static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+	if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
 		return 0;
 #endif
 	return 1;
@@ -2337,7 +2337,10 @@ redo:
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * But we need to sync the setting of page and object.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2347,10 +2350,14 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
+
+	/* Must have tid first in case an interrupt comes in */
 	barrier();
 
 	object = c->freelist;
 	page = c->page;
+	preempt_enable();
+
 	if (unlikely(!object || !node_match(page, node)))
 		object = __slab_alloc(s, gfpflags, node, addr, c);
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v3] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 15:09                   ` [RFC][PATCH v3] " Steven Rostedt
@ 2013-01-18 18:40                     ` Christoph Lameter
  2013-01-18 19:09                       ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 18:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Steven Rostedt wrote:

> @@ -2337,7 +2337,10 @@ redo:
>  	 * enabled. We may switch back and forth between cpus while
>  	 * reading from one cpu area. That does not matter as long
>  	 * as we end up on the original cpu again when doing the cmpxchg.
> +	 *
> +	 * But we need to sync the setting of page and object.
>  	 */
> +	preempt_disable();
>  	c = __this_cpu_ptr(s->cpu_slab);
>
>  	/*
> @@ -2347,10 +2350,14 @@ redo:
>  	 * linked list in between.
>  	 */
>  	tid = c->tid;

The fetching of the tid is the only critical thing here. If the tid is
retrieved from the right cpu then the cmpxchg will fail if any changes
occured to freelist or the page variable.

The tid can be retrieved without disabling preemption through
this_cpu_read().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v3] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 18:40                     ` Christoph Lameter
@ 2013-01-18 19:09                       ` Eric Dumazet
  2013-01-18 19:20                         ` Steven Rostedt
  2013-01-21  1:40                         ` Christoph Lameter
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Dumazet @ 2013-01-18 19:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 18:40 +0000, Christoph Lameter wrote:

> The fetching of the tid is the only critical thing here. If the tid is
> retrieved from the right cpu then the cmpxchg will fail if any changes
> occured to freelist or the page variable.
> 
> The tid can be retrieved without disabling preemption through
> this_cpu_read().

Strictly speaking, this_cpu_read() _does_ disable preemption.

Of course, on x86, this_cpu_read() uses __this_cpu_read()



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v3] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 19:09                       ` Eric Dumazet
@ 2013-01-18 19:20                         ` Steven Rostedt
  2013-01-21  1:40                         ` Christoph Lameter
  1 sibling, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-18 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 2013-01-18 at 11:09 -0800, Eric Dumazet wrote:
> On Fri, 2013-01-18 at 18:40 +0000, Christoph Lameter wrote:
> 
> > The fetching of the tid is the only critical thing here. If the tid is
> > retrieved from the right cpu then the cmpxchg will fail if any changes
> > occured to freelist or the page variable.
> > 
> > The tid can be retrieved without disabling preemption through
> > this_cpu_read().
> 
> Strictly speaking, this_cpu_read() _does_ disable preemption.

I was thinking the same thing.

> 
> Of course, on x86, this_cpu_read() uses __this_cpu_read()


Looking at using this_cpu_read() on tid, I can't see what would break.
We still need the check for page being NULL in node_match(). I'm still
uneasy about it. Just because we can't see how it might break doesn't
mean that it wont. As we have already found a bit of bugs in the current
code. I'd feel more comfortable with the explicit preempt_disable(). And
this is coming from an -rt guy that tries to avoid preempt_disable().
But you're (Christoph) the maintainer.

I guess if you use this_cpu_read() you don't need that barrier anymore.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH v3] slub: Keep page and object in sync in slab_alloc_node()
  2013-01-18 19:09                       ` Eric Dumazet
  2013-01-18 19:20                         ` Steven Rostedt
@ 2013-01-21  1:40                         ` Christoph Lameter
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-21  1:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Steven Rostedt, LKML, linux-mm, Andrew Morton, Pekka Enberg,
	Matt Mackall, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Fri, 18 Jan 2013, Eric Dumazet wrote:

> On Fri, 2013-01-18 at 18:40 +0000, Christoph Lameter wrote:
>
> > The fetching of the tid is the only critical thing here. If the tid is
> > retrieved from the right cpu then the cmpxchg will fail if any changes
> > occured to freelist or the page variable.
> >
> > The tid can be retrieved without disabling preemption through
> > this_cpu_read().
>
> Strictly speaking, this_cpu_read() _does_ disable preemption.

Yes on x86 the relocation and the fetch is just a single instruction and
therefore no preemption can occur.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 23:10           ` Steven Rostedt
  2013-01-17 23:20             ` [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node() Steven Rostedt
@ 2013-01-18 14:43             ` Christoph Lameter
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-18 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 17 Jan 2013, Steven Rostedt wrote:

> Because there's also nothing to keep page related to object either, we
> may just need to do:

Adding the NULL check is satisfactory I think. The TID check guarantees
that nothing happened in between and the call the __slab_alloc can do the
same as the fastpath if it mistakenly goes to the slowpath.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <alpine.DEB.2.02.1301171547370.2774@gentwo.org>]

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
       [not found]       ` <alpine.DEB.2.02.1301171547370.2774@gentwo.org>
@ 2013-01-17 21:56         ` Christoph Lameter
  2013-01-17 22:10           ` Steven Rostedt
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2013-01-17 21:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

on Thu, 17 Jan 2013, Christoph Lameter wrote:

> Ditto which leaves us with:
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-17 15:49:57.417491975 -0600
> +++ linux/mm/slub.c	2013-01-17 15:50:49.010287150 -0600
> @@ -1993,8 +1993,9 @@ static inline void flush_slab(struct kme
>  	deactivate_slab(s, c->page, c->freelist);
>
>  	c->tid = next_tid(c->tid);
> -	c->page = NULL;
>  	c->freelist = NULL;
> +	barrier();
> +	c->page = NULL;
>  }

But the larger question is why is flush_slab() called with interrupts
enabled?

RT?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 21:56         ` Christoph Lameter
@ 2013-01-17 22:10           ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2013-01-17 22:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 2013-01-17 at 21:56 +0000, Christoph Lameter wrote:
> on Thu, 17 Jan 2013, Christoph Lameter wrote:
> 
> > Ditto which leaves us with:
> >
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c	2013-01-17 15:49:57.417491975 -0600
> > +++ linux/mm/slub.c	2013-01-17 15:50:49.010287150 -0600
> > @@ -1993,8 +1993,9 @@ static inline void flush_slab(struct kme
> >  	deactivate_slab(s, c->page, c->freelist);
> >
> >  	c->tid = next_tid(c->tid);
> > -	c->page = NULL;
> >  	c->freelist = NULL;
> > +	barrier();
> > +	c->page = NULL;
> >  }
> 
> But the larger question is why is flush_slab() called with interrupts
> enabled?
> 
> RT?

Could be, there's a few tweaks made in -rt. I'll put in a WARN_ON_ONCE()
if flush_slab() is called with interrupts enabled just to see where it
happens.

-- Steve


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH] slub: Check for page NULL before doing the node_match check
  2013-01-17 18:10 [RFC][PATCH] slub: Check for page NULL before doing the node_match check Steven Rostedt
  2013-01-17 18:37 ` Steven Rostedt
@ 2013-01-17 21:22 ` Christoph Lameter
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2013-01-17 21:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-mm, Andrew Morton, Pekka Enberg, Matt Mackall,
	Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves

On Thu, 17 Jan 2013, Steven Rostedt wrote:

> Anyway, looking at where this crashed, it seems that the page variable
> can be NULL when passed to the node_match() function (which does not
> check if it is NULL). When this happens we get the above panic.
>
> As page is only used in slab_alloc() to check if the node matches, if
> it's NULL I'm assuming that we can say it doesn't and call the
> __slab_alloc() code. Is this a correct assumption?

c->page should only be NULL when c->freelist == NULL but obviously there
are race conditions where c->freelist may not have been zapped but c->page
was.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-01-21 12:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-17 18:10 [RFC][PATCH] slub: Check for page NULL before doing the node_match check Steven Rostedt
2013-01-17 18:37 ` Steven Rostedt
2013-01-17 21:28   ` Christoph Lameter
2013-01-17 21:36     ` Eric Dumazet
2013-01-17 21:44       ` Steven Rostedt
2013-01-17 21:43     ` Steven Rostedt
2013-01-17 21:51       ` Christoph Lameter
2013-01-17 22:07         ` Steven Rostedt
2013-01-17 22:46         ` Steven Rostedt
2013-01-17 23:10           ` Steven Rostedt
2013-01-17 23:20             ` [RFC][PATCH] slub: Keep page and object in sync in slab_alloc_node() Steven Rostedt
2013-01-18  0:23               ` Steven Rostedt
2013-01-18  0:28                 ` [RFC][PATCH v2] " Steven Rostedt
2013-01-18  4:42                   ` Joonsoo Kim
2013-01-18 14:52                     ` Christoph Lameter
2013-01-18 15:29                     ` Steven Rostedt
2013-01-18 14:44                   ` Christoph Lameter
2013-01-18 15:04                     ` Steven Rostedt
2013-01-18 15:55                       ` Steven Rostedt
2013-01-18 18:29                         ` Christoph Lameter
2013-01-18 18:52                           ` Steven Rostedt
2013-01-21  1:48                             ` Christoph Lameter
2013-01-21  8:11                         ` Joonsoo Kim
2013-01-21 12:19                           ` Steven Rostedt
2013-01-18 18:23                       ` Christoph Lameter
2013-01-18 15:09                   ` [RFC][PATCH v3] " Steven Rostedt
2013-01-18 18:40                     ` Christoph Lameter
2013-01-18 19:09                       ` Eric Dumazet
2013-01-18 19:20                         ` Steven Rostedt
2013-01-21  1:40                         ` Christoph Lameter
2013-01-18 14:43             ` [RFC][PATCH] slub: Check for page NULL before doing the node_match check Christoph Lameter
     [not found]       ` <alpine.DEB.2.02.1301171547370.2774@gentwo.org>
2013-01-17 21:56         ` Christoph Lameter
2013-01-17 22:10           ` Steven Rostedt
2013-01-17 21:22 ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).