Re: INFO: possible recursive locking detected: get_partial_node() on 3.2-rc1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yong Zhang <yong.zhang0@gmail.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Christoph Lameter <cl@linux.com>,
	Julie Sullivan <kernelmail.jms@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Pekka Enberg <penberg@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: INFO: possible recursive locking detected: get_partial_node() on 3.2-rc1
Date: Tue, 15 Nov 2011 15:22:52 +0800	[thread overview]
Message-ID: <20111115072251.GA10389@zhy> (raw)
In-Reply-To: <1321248853.22361.280.camel@sli10-conroe>

On Mon, Nov 14, 2011 at 01:34:13PM +0800, Shaohua Li wrote:
> On Fri, 2011-11-11 at 23:02 +0800, Christoph Lameter wrote:
> > On Fri, 11 Nov 2011, Shaohua Li wrote:
> > 
> > > Looks this could be a real dead lock. we hold a lock to free a object,
> > > but the free need allocate a new object. if the new object and the freed
> > > object are from the same slab, there is a deadlock.
> > 
> > unfreeze partials is never called when going through get_partial_node()
> > so there is no deadlock AFAICT.
> the unfreeze_partial isn't called from get_partial_node(). I thought the
> code path is something like this: kmem_cache_free()->put_cpu_partial()
> (hold lock) ->unfreeze_partials() ->discard_slab ->debug_object_init()
> ->kmem_cache_alloc->get_partial_node()(hold lock). Not sure if this will
> really happen, but looks like a deadlock.
> But anyway, discard_slab() can be move out of unfreeze_partials()
> 
> > > discard_slab() doesn't need hold the lock if the slab is already removed
> > > from partial list. how about below patch, only compile tested.
> > 
> > In general I think it is good to move the call to discard_slab() out from
> > under the list_lock in unfreeze_partials(). Could you fold
> > discard_page_list into unfreeze_partials()? __flush_cpu_slab still calls
> > discard_page_list with disabled interrupts even after your patch.
> I'm afraid there is alloc-in-atomic() error, but Yong & Julie's test
> shows this is over thinking. Here is the updated patch. Yong & Julie, I
> added your report/test by, because the new patch should be just like the
> old one, but since I changed it a little bit, can you please have a
> quick check? Thanks!
> 
> 
> 
> Subject: slub: move discard_slab out of node lock
> 
> Lockdep reports there is potential deadlock for slub node list_lock.
> discard_slab() is called with the lock hold in unfreeze_partials(),
> which could trigger a slab allocation, which could hold the lock again.
> 
> discard_slab() doesn't need hold the lock actually, if the slab is
> already removed from partial list.
> 
> Reported-and-tested-by: Yong Zhang <yong.zhang0@gmail.com>
> Reported-and-tested-by: Julie Sullivan <kernelmail.jms@gmail.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Tested-by: Yong Zhang <yong.zhang0@gmail.com>

Thanks,
Yong

> ---
>  mm/slub.c |   16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2011-11-11 16:17:39.000000000 +0800
> +++ linux/mm/slub.c	2011-11-14 13:11:11.000000000 +0800
> @@ -1862,7 +1862,7 @@ static void unfreeze_partials(struct kme
>  {
>  	struct kmem_cache_node *n = NULL;
>  	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
> -	struct page *page;
> +	struct page *page, *discard_page = NULL;
>  
>  	while ((page = c->partial)) {
>  		enum slab_modes { M_PARTIAL, M_FREE };
> @@ -1916,14 +1916,22 @@ static void unfreeze_partials(struct kme
>  				"unfreezing slab"));
>  
>  		if (m == M_FREE) {
> -			stat(s, DEACTIVATE_EMPTY);
> -			discard_slab(s, page);
> -			stat(s, FREE_SLAB);
> +			page->next = discard_page;
> +			discard_page = page;
>  		}
>  	}
>  
>  	if (n)
>  		spin_unlock(&n->list_lock);
> +
> +	while (discard_page) {
> +		page = discard_page;
> +		discard_page = discard_page->next;
> +
> +		stat(s, DEACTIVATE_EMPTY);
> +		discard_slab(s, page);
> +		stat(s, FREE_SLAB);
> +	}
>  }
>  
>  /*
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Only stand for myself

WARNING: multiple messages have this Message-ID (diff)

From: Yong Zhang <yong.zhang0@gmail.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Christoph Lameter <cl@linux.com>,
	Julie Sullivan <kernelmail.jms@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Pekka Enberg <penberg@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: INFO: possible recursive locking detected: get_partial_node() on 3.2-rc1
Date: Tue, 15 Nov 2011 15:22:52 +0800	[thread overview]
Message-ID: <20111115072251.GA10389@zhy> (raw)
In-Reply-To: <1321248853.22361.280.camel@sli10-conroe>

On Mon, Nov 14, 2011 at 01:34:13PM +0800, Shaohua Li wrote:
> On Fri, 2011-11-11 at 23:02 +0800, Christoph Lameter wrote:
> > On Fri, 11 Nov 2011, Shaohua Li wrote:
> > 
> > > Looks this could be a real dead lock. we hold a lock to free a object,
> > > but the free need allocate a new object. if the new object and the freed
> > > object are from the same slab, there is a deadlock.
> > 
> > unfreeze partials is never called when going through get_partial_node()
> > so there is no deadlock AFAICT.
> the unfreeze_partial isn't called from get_partial_node(). I thought the
> code path is something like this: kmem_cache_free()->put_cpu_partial()
> (hold lock) ->unfreeze_partials() ->discard_slab ->debug_object_init()
> ->kmem_cache_alloc->get_partial_node()(hold lock). Not sure if this will
> really happen, but looks like a deadlock.
> But anyway, discard_slab() can be move out of unfreeze_partials()
> 
> > > discard_slab() doesn't need hold the lock if the slab is already removed
> > > from partial list. how about below patch, only compile tested.
> > 
> > In general I think it is good to move the call to discard_slab() out from
> > under the list_lock in unfreeze_partials(). Could you fold
> > discard_page_list into unfreeze_partials()? __flush_cpu_slab still calls
> > discard_page_list with disabled interrupts even after your patch.
> I'm afraid there is alloc-in-atomic() error, but Yong & Julie's test
> shows this is over thinking. Here is the updated patch. Yong & Julie, I
> added your report/test by, because the new patch should be just like the
> old one, but since I changed it a little bit, can you please have a
> quick check? Thanks!
> 
> 
> 
> Subject: slub: move discard_slab out of node lock
> 
> Lockdep reports there is potential deadlock for slub node list_lock.
> discard_slab() is called with the lock hold in unfreeze_partials(),
> which could trigger a slab allocation, which could hold the lock again.
> 
> discard_slab() doesn't need hold the lock actually, if the slab is
> already removed from partial list.
> 
> Reported-and-tested-by: Yong Zhang <yong.zhang0@gmail.com>
> Reported-and-tested-by: Julie Sullivan <kernelmail.jms@gmail.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Tested-by: Yong Zhang <yong.zhang0@gmail.com>

Thanks,
Yong

> ---
>  mm/slub.c |   16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2011-11-11 16:17:39.000000000 +0800
> +++ linux/mm/slub.c	2011-11-14 13:11:11.000000000 +0800
> @@ -1862,7 +1862,7 @@ static void unfreeze_partials(struct kme
>  {
>  	struct kmem_cache_node *n = NULL;
>  	struct kmem_cache_cpu *c = this_cpu_ptr(s->cpu_slab);
> -	struct page *page;
> +	struct page *page, *discard_page = NULL;
>  
>  	while ((page = c->partial)) {
>  		enum slab_modes { M_PARTIAL, M_FREE };
> @@ -1916,14 +1916,22 @@ static void unfreeze_partials(struct kme
>  				"unfreezing slab"));
>  
>  		if (m == M_FREE) {
> -			stat(s, DEACTIVATE_EMPTY);
> -			discard_slab(s, page);
> -			stat(s, FREE_SLAB);
> +			page->next = discard_page;
> +			discard_page = page;
>  		}
>  	}
>  
>  	if (n)
>  		spin_unlock(&n->list_lock);
> +
> +	while (discard_page) {
> +		page = discard_page;
> +		discard_page = discard_page->next;
> +
> +		stat(s, DEACTIVATE_EMPTY);
> +		discard_slab(s, page);
> +		stat(s, FREE_SLAB);
> +	}
>  }
>  
>  /*
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Only stand for myself

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-11-15  7:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-09  9:05 3.2-rc1: INFO: possible recursive locking detected Yong Zhang
2011-11-09  9:09 ` Yong Zhang
2011-11-10 23:35 ` INFO: possible recursive locking detected: get_partial_node() on 3.2-rc1 Julie Sullivan
2011-11-10 23:35   ` Julie Sullivan
2011-11-11  3:04   ` Shaohua Li
2011-11-11  3:04     ` Shaohua Li
2011-11-11  4:42     ` Yong Zhang
2011-11-11  4:42       ` Yong Zhang
2011-11-11 14:57       ` Christoph Lameter
2011-11-11 14:57         ` Christoph Lameter
2011-11-11 15:02     ` Christoph Lameter
2011-11-11 15:02       ` Christoph Lameter
2011-11-11 19:09       ` Julie Sullivan
2011-11-11 19:09         ` Julie Sullivan
2011-11-11 19:22         ` Paul E. McKenney
2011-11-11 19:22           ` Paul E. McKenney
2011-11-14  5:34       ` Shaohua Li
2011-11-14  5:34         ` Shaohua Li
2011-11-15  7:22         ` Yong Zhang [this message]
2011-11-15  7:22           ` Yong Zhang
2011-11-15 18:49           ` Pekka Enberg
2011-11-15 18:49             ` Pekka Enberg
2011-11-15 16:02         ` Christoph Lameter
2011-11-15 16:02           ` Christoph Lameter
2011-11-10 23:47 ` 3.2-rc1: INFO: possible recursive locking detected Paul E. McKenney
2011-11-11 14:43   ` Christoph Lameter
2011-11-11 15:48     ` Paul E. McKenney
2011-11-11 13:41 ` INFO: possible recursive locking detected: do_lookup() on 3.2-rc1 Shawn Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111115072251.GA10389@zhy \
    --to=yong.zhang0@gmail.com \
    --cc=cl@linux.com \
    --cc=kernelmail.jms@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=penberg@kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.