linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: David Rientjes <rientjes@google.com>
Cc: Sasha Levin <levinsasha928@gmail.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	bhutchings@solarflare.com,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hugh Dickins <hughd@google.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps
Date: Thu, 25 Oct 2012 16:39:32 +0200	[thread overview]
Message-ID: <1351175972.12171.14.camel@twins> (raw)
In-Reply-To: <1351167554.23337.14.camel@twins>

On Thu, 2012-10-25 at 14:19 +0200, Peter Zijlstra wrote:
> On Wed, 2012-10-24 at 17:08 -0700, David Rientjes wrote:
> > Ok, this looks the same but it's actually a different issue: 
> > mpol_misplaced(), which now only exists in linux-next and not in 3.7-rc2, 
> > calls get_vma_policy() which may take the shared policy mutex.  This 
> > happens while holding page_table_lock from do_huge_pmd_numa_page() but 
> > also from do_numa_page() while holding a spinlock on the ptl, which is 
> > coming from the sched/numa branch.
> > 
> > Is there anyway that we can avoid changing the shared policy mutex back 
> > into a spinlock (it was converted in b22d127a39dd ["mempolicy: fix a race 
> > in shared_policy_replace()"])?
> > 
> > Adding Peter, Rik, and Mel to the cc. 
> 
> Urgh, crud I totally missed that.
> 
> So the problem is that we need to compute if the current page is placed
> 'right' while holding pte_lock in order to avoid multiple pte_lock
> acquisitions on the 'fast' path.
> 
> I'll look into this in a bit, but one thing that comes to mind is having
> both a spnilock and a mutex and require holding both for modification
> while either one is sufficient for read.
> 
> That would allow sp_lookup() to use the spinlock, while insert and
> replace can hold both.
> 
> Not sure it will work for this, need to stare at this code a little
> more.

So I think the below should work, we hold the spinlock over both rb-tree
modification as sp free, this makes mpol_shared_policy_lookup() which
returns the policy with an incremented refcount work with just the
spinlock.

Comments?

---
 include/linux/mempolicy.h |    1 +
 mm/mempolicy.c            |   23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -133,6 +133,7 @@ struct sp_node {
 
 struct shared_policy {
 	struct rb_root root;
+	spinlock_t lock;
 	struct mutex mutex;
 };
 
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2099,12 +2099,20 @@ bool __mpol_equal(struct mempolicy *a, s
  *
  * Remember policies even when nobody has shared memory mapped.
  * The policies are kept in Red-Black tree linked from the inode.
- * They are protected by the sp->lock spinlock, which should be held
- * for any accesses to the tree.
+ *
+ * The rb-tree is locked using both a mutex and a spinlock. Every modification
+ * to the tree must hold both the mutex and the spinlock, lookups can hold
+ * either to observe a stable tree.
+ *
+ * In particular, sp_insert() and sp_delete() take the spinlock, whereas
+ * sp_lookup() doesn't, this so users have choice.
+ *
+ * shared_policy_replace() and mpol_free_shared_policy() take the mutex
+ * and call sp_insert(), sp_delete().
  */
 
 /* lookup first element intersecting start-end */
-/* Caller holds sp->mutex */
+/* Caller holds either sp->lock and/or sp->mutex */
 static struct sp_node *
 sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
 {
@@ -2143,6 +2151,7 @@ static void sp_insert(struct shared_poli
 	struct rb_node *parent = NULL;
 	struct sp_node *nd;
 
+	spin_lock(&sp->lock);
 	while (*p) {
 		parent = *p;
 		nd = rb_entry(parent, struct sp_node, nd);
@@ -2155,6 +2164,7 @@ static void sp_insert(struct shared_poli
 	}
 	rb_link_node(&new->nd, parent, p);
 	rb_insert_color(&new->nd, &sp->root);
+	spin_unlock(&sp->lock);
 	pr_debug("inserting %lx-%lx: %d\n", new->start, new->end,
 		 new->policy ? new->policy->mode : 0);
 }
@@ -2168,13 +2178,13 @@ mpol_shared_policy_lookup(struct shared_
 
 	if (!sp->root.rb_node)
 		return NULL;
-	mutex_lock(&sp->mutex);
+	spin_lock(&sp->lock);
 	sn = sp_lookup(sp, idx, idx+1);
 	if (sn) {
 		mpol_get(sn->policy);
 		pol = sn->policy;
 	}
-	mutex_unlock(&sp->mutex);
+	spin_unlock(&sp->lock);
 	return pol;
 }
 
@@ -2295,8 +2305,10 @@ int mpol_misplaced(struct page *page, st
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
 	pr_debug("deleting %lx-l%lx\n", n->start, n->end);
+	spin_lock(&sp->lock);
 	rb_erase(&n->nd, &sp->root);
 	sp_free(n);
+	spin_unlock(&sp->lock);
 }
 
 static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
@@ -2381,6 +2393,7 @@ void mpol_shared_policy_init(struct shar
 	int ret;
 
 	sp->root = RB_ROOT;		/* empty tree == default mempolicy */
+	spin_lock_init(&sp->lock);
 	mutex_init(&sp->mutex);
 
 	if (mpol) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-10-25 14:39 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-08 15:09 mpol_to_str revisited Dave Jones
2012-10-08 15:15 ` Dave Jones
2012-10-08 20:46   ` David Rientjes
2012-10-08 20:35 ` David Rientjes
2012-10-08 20:52   ` Dave Jones
2012-10-16  0:48     ` David Rientjes
2012-10-09  0:33 ` Ben Hutchings
2012-10-16  2:34 ` KOSAKI Motohiro
2012-10-16  3:58   ` David Rientjes
2012-10-16  5:10     ` KOSAKI Motohiro
2012-10-16  6:10       ` David Rientjes
2012-10-16 23:39         ` KOSAKI Motohiro
2012-10-17  0:12           ` David Rientjes
2012-10-17  0:31             ` [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps David Rientjes
2012-10-17  1:38               ` KOSAKI Motohiro
2012-10-17  1:49                 ` David Rientjes
2012-10-17  1:53                   ` KOSAKI Motohiro
2012-10-17  4:05               ` Dave Jones
2012-10-17  5:24                 ` David Rientjes
2012-10-17  5:42                   ` Kamezawa Hiroyuki
2012-10-17  8:49                     ` KOSAKI Motohiro
2012-10-17 19:50                       ` David Rientjes
2012-10-17 21:05                         ` KOSAKI Motohiro
2012-10-17 21:27                           ` David Rientjes
2012-10-17 18:14                   ` Dave Jones
2012-10-17 19:21                     ` David Rientjes
2012-10-17 19:32                       ` Dave Jones
2012-10-17 19:38                         ` David Rientjes
2012-10-17 19:45                           ` Dave Jones
2012-10-17 20:28                             ` [patch for-3.7] mm, mempolicy: avoid taking mutex inside spinlock when reading numa_maps David Rientjes
2012-10-17 21:31                               ` [patch for-3.7 v2] " David Rientjes
2012-10-18  4:06                                 ` Kamezawa Hiroyuki
2012-10-18  4:14                                   ` Linus Torvalds
2012-10-18  4:41                                     ` Kamezawa Hiroyuki
2012-10-18  4:34                                   ` Kamezawa Hiroyuki
2012-10-18 20:03                                     ` David Rientjes
2012-10-19  8:35                                       ` [patch for-3.7 v3] mm, mempolicy: hold task->mempolicy refcount while " Kamezawa Hiroyuki
2012-10-19  9:28                                         ` David Rientjes
2012-10-22  2:47                                           ` Kamezawa Hiroyuki
2012-10-22 20:55                                             ` Andrew Morton
2012-10-22 20:56                                             ` David Rientjes
2012-10-19 19:15                                         ` KOSAKI Motohiro
2012-10-19  6:51                                     ` [patch for-3.7 v2] mm, mempolicy: avoid taking mutex inside spinlock when " KOSAKI Motohiro
2012-10-18  4:35                                   ` David Rientjes
2012-10-24 23:30                   ` [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps Sasha Levin
2012-10-24 23:34                     ` David Rientjes
2012-10-24 23:37                       ` Sasha Levin
2012-10-25  0:08                         ` David Rientjes
2012-10-25  0:54                           ` KOSAKI Motohiro
2012-10-25  1:15                             ` David Rientjes
2012-10-25 12:19                           ` Peter Zijlstra
2012-10-25 14:39                             ` Peter Zijlstra [this message]
2012-10-25 17:23                               ` Sasha Levin
2012-10-25 20:22                               ` David Rientjes
2012-10-25 23:09                               ` Linus Torvalds
2012-10-26  8:48                                 ` Peter Zijlstra
2012-10-31 18:29                                   ` Sasha Levin
2012-11-21  0:59                                     ` Sasha Levin
2012-10-17  1:33             ` mpol_to_str revisited KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351175972.12171.14.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=bhutchings@solarflare.com \
    --cc=davej@redhat.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=khlebnikov@openvz.org \
    --cc=kosaki.motohiro@gmail.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).