From: Peter Zijlstra <peterz@infradead.org>
To: David Rientjes <rientjes@google.com>
Cc: Sasha Levin <levinsasha928@gmail.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Dave Jones <davej@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
bhutchings@solarflare.com,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps
Date: Thu, 25 Oct 2012 16:39:32 +0200 [thread overview]
Message-ID: <1351175972.12171.14.camel@twins> (raw)
In-Reply-To: <1351167554.23337.14.camel@twins>
On Thu, 2012-10-25 at 14:19 +0200, Peter Zijlstra wrote:
> On Wed, 2012-10-24 at 17:08 -0700, David Rientjes wrote:
> > Ok, this looks the same but it's actually a different issue:
> > mpol_misplaced(), which now only exists in linux-next and not in 3.7-rc2,
> > calls get_vma_policy() which may take the shared policy mutex. This
> > happens while holding page_table_lock from do_huge_pmd_numa_page() but
> > also from do_numa_page() while holding a spinlock on the ptl, which is
> > coming from the sched/numa branch.
> >
> > Is there anyway that we can avoid changing the shared policy mutex back
> > into a spinlock (it was converted in b22d127a39dd ["mempolicy: fix a race
> > in shared_policy_replace()"])?
> >
> > Adding Peter, Rik, and Mel to the cc.
>
> Urgh, crud I totally missed that.
>
> So the problem is that we need to compute if the current page is placed
> 'right' while holding pte_lock in order to avoid multiple pte_lock
> acquisitions on the 'fast' path.
>
> I'll look into this in a bit, but one thing that comes to mind is having
> both a spnilock and a mutex and require holding both for modification
> while either one is sufficient for read.
>
> That would allow sp_lookup() to use the spinlock, while insert and
> replace can hold both.
>
> Not sure it will work for this, need to stare at this code a little
> more.
So I think the below should work, we hold the spinlock over both rb-tree
modification as sp free, this makes mpol_shared_policy_lookup() which
returns the policy with an incremented refcount work with just the
spinlock.
Comments?
---
include/linux/mempolicy.h | 1 +
mm/mempolicy.c | 23 ++++++++++++++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -133,6 +133,7 @@ struct sp_node {
struct shared_policy {
struct rb_root root;
+ spinlock_t lock;
struct mutex mutex;
};
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2099,12 +2099,20 @@ bool __mpol_equal(struct mempolicy *a, s
*
* Remember policies even when nobody has shared memory mapped.
* The policies are kept in Red-Black tree linked from the inode.
- * They are protected by the sp->lock spinlock, which should be held
- * for any accesses to the tree.
+ *
+ * The rb-tree is locked using both a mutex and a spinlock. Every modification
+ * to the tree must hold both the mutex and the spinlock, lookups can hold
+ * either to observe a stable tree.
+ *
+ * In particular, sp_insert() and sp_delete() take the spinlock, whereas
+ * sp_lookup() doesn't, this so users have choice.
+ *
+ * shared_policy_replace() and mpol_free_shared_policy() take the mutex
+ * and call sp_insert(), sp_delete().
*/
/* lookup first element intersecting start-end */
-/* Caller holds sp->mutex */
+/* Caller holds either sp->lock and/or sp->mutex */
static struct sp_node *
sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
{
@@ -2143,6 +2151,7 @@ static void sp_insert(struct shared_poli
struct rb_node *parent = NULL;
struct sp_node *nd;
+ spin_lock(&sp->lock);
while (*p) {
parent = *p;
nd = rb_entry(parent, struct sp_node, nd);
@@ -2155,6 +2164,7 @@ static void sp_insert(struct shared_poli
}
rb_link_node(&new->nd, parent, p);
rb_insert_color(&new->nd, &sp->root);
+ spin_unlock(&sp->lock);
pr_debug("inserting %lx-%lx: %d\n", new->start, new->end,
new->policy ? new->policy->mode : 0);
}
@@ -2168,13 +2178,13 @@ mpol_shared_policy_lookup(struct shared_
if (!sp->root.rb_node)
return NULL;
- mutex_lock(&sp->mutex);
+ spin_lock(&sp->lock);
sn = sp_lookup(sp, idx, idx+1);
if (sn) {
mpol_get(sn->policy);
pol = sn->policy;
}
- mutex_unlock(&sp->mutex);
+ spin_unlock(&sp->lock);
return pol;
}
@@ -2295,8 +2305,10 @@ int mpol_misplaced(struct page *page, st
static void sp_delete(struct shared_policy *sp, struct sp_node *n)
{
pr_debug("deleting %lx-l%lx\n", n->start, n->end);
+ spin_lock(&sp->lock);
rb_erase(&n->nd, &sp->root);
sp_free(n);
+ spin_unlock(&sp->lock);
}
static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
@@ -2381,6 +2393,7 @@ void mpol_shared_policy_init(struct shar
int ret;
sp->root = RB_ROOT; /* empty tree == default mempolicy */
+ spin_lock_init(&sp->lock);
mutex_init(&sp->mutex);
if (mpol) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: David Rientjes <rientjes@google.com>
Cc: Sasha Levin <levinsasha928@gmail.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Dave Jones <davej@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
bhutchings@solarflare.com,
Konstantin Khlebnikov <khlebnikov@openvz.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps
Date: Thu, 25 Oct 2012 16:39:32 +0200 [thread overview]
Message-ID: <1351175972.12171.14.camel@twins> (raw)
In-Reply-To: <1351167554.23337.14.camel@twins>
On Thu, 2012-10-25 at 14:19 +0200, Peter Zijlstra wrote:
> On Wed, 2012-10-24 at 17:08 -0700, David Rientjes wrote:
> > Ok, this looks the same but it's actually a different issue:
> > mpol_misplaced(), which now only exists in linux-next and not in 3.7-rc2,
> > calls get_vma_policy() which may take the shared policy mutex. This
> > happens while holding page_table_lock from do_huge_pmd_numa_page() but
> > also from do_numa_page() while holding a spinlock on the ptl, which is
> > coming from the sched/numa branch.
> >
> > Is there anyway that we can avoid changing the shared policy mutex back
> > into a spinlock (it was converted in b22d127a39dd ["mempolicy: fix a race
> > in shared_policy_replace()"])?
> >
> > Adding Peter, Rik, and Mel to the cc.
>
> Urgh, crud I totally missed that.
>
> So the problem is that we need to compute if the current page is placed
> 'right' while holding pte_lock in order to avoid multiple pte_lock
> acquisitions on the 'fast' path.
>
> I'll look into this in a bit, but one thing that comes to mind is having
> both a spnilock and a mutex and require holding both for modification
> while either one is sufficient for read.
>
> That would allow sp_lookup() to use the spinlock, while insert and
> replace can hold both.
>
> Not sure it will work for this, need to stare at this code a little
> more.
So I think the below should work, we hold the spinlock over both rb-tree
modification as sp free, this makes mpol_shared_policy_lookup() which
returns the policy with an incremented refcount work with just the
spinlock.
Comments?
---
include/linux/mempolicy.h | 1 +
mm/mempolicy.c | 23 ++++++++++++++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -133,6 +133,7 @@ struct sp_node {
struct shared_policy {
struct rb_root root;
+ spinlock_t lock;
struct mutex mutex;
};
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2099,12 +2099,20 @@ bool __mpol_equal(struct mempolicy *a, s
*
* Remember policies even when nobody has shared memory mapped.
* The policies are kept in Red-Black tree linked from the inode.
- * They are protected by the sp->lock spinlock, which should be held
- * for any accesses to the tree.
+ *
+ * The rb-tree is locked using both a mutex and a spinlock. Every modification
+ * to the tree must hold both the mutex and the spinlock, lookups can hold
+ * either to observe a stable tree.
+ *
+ * In particular, sp_insert() and sp_delete() take the spinlock, whereas
+ * sp_lookup() doesn't, this so users have choice.
+ *
+ * shared_policy_replace() and mpol_free_shared_policy() take the mutex
+ * and call sp_insert(), sp_delete().
*/
/* lookup first element intersecting start-end */
-/* Caller holds sp->mutex */
+/* Caller holds either sp->lock and/or sp->mutex */
static struct sp_node *
sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end)
{
@@ -2143,6 +2151,7 @@ static void sp_insert(struct shared_poli
struct rb_node *parent = NULL;
struct sp_node *nd;
+ spin_lock(&sp->lock);
while (*p) {
parent = *p;
nd = rb_entry(parent, struct sp_node, nd);
@@ -2155,6 +2164,7 @@ static void sp_insert(struct shared_poli
}
rb_link_node(&new->nd, parent, p);
rb_insert_color(&new->nd, &sp->root);
+ spin_unlock(&sp->lock);
pr_debug("inserting %lx-%lx: %d\n", new->start, new->end,
new->policy ? new->policy->mode : 0);
}
@@ -2168,13 +2178,13 @@ mpol_shared_policy_lookup(struct shared_
if (!sp->root.rb_node)
return NULL;
- mutex_lock(&sp->mutex);
+ spin_lock(&sp->lock);
sn = sp_lookup(sp, idx, idx+1);
if (sn) {
mpol_get(sn->policy);
pol = sn->policy;
}
- mutex_unlock(&sp->mutex);
+ spin_unlock(&sp->lock);
return pol;
}
@@ -2295,8 +2305,10 @@ int mpol_misplaced(struct page *page, st
static void sp_delete(struct shared_policy *sp, struct sp_node *n)
{
pr_debug("deleting %lx-l%lx\n", n->start, n->end);
+ spin_lock(&sp->lock);
rb_erase(&n->nd, &sp->root);
sp_free(n);
+ spin_unlock(&sp->lock);
}
static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
@@ -2381,6 +2393,7 @@ void mpol_shared_policy_init(struct shar
int ret;
sp->root = RB_ROOT; /* empty tree == default mempolicy */
+ spin_lock_init(&sp->lock);
mutex_init(&sp->mutex);
if (mpol) {
next prev parent reply other threads:[~2012-10-25 14:39 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-08 15:09 mpol_to_str revisited Dave Jones
2012-10-08 15:09 ` Dave Jones
2012-10-08 15:15 ` Dave Jones
2012-10-08 15:15 ` Dave Jones
2012-10-08 20:46 ` David Rientjes
2012-10-08 20:46 ` David Rientjes
2012-10-08 20:35 ` David Rientjes
2012-10-08 20:35 ` David Rientjes
2012-10-08 20:52 ` Dave Jones
2012-10-08 20:52 ` Dave Jones
2012-10-16 0:48 ` David Rientjes
2012-10-16 0:48 ` David Rientjes
2012-10-09 0:33 ` Ben Hutchings
2012-10-16 2:34 ` KOSAKI Motohiro
2012-10-16 2:34 ` KOSAKI Motohiro
2012-10-16 3:58 ` David Rientjes
2012-10-16 3:58 ` David Rientjes
2012-10-16 5:10 ` KOSAKI Motohiro
2012-10-16 5:10 ` KOSAKI Motohiro
2012-10-16 6:10 ` David Rientjes
2012-10-16 6:10 ` David Rientjes
2012-10-16 23:39 ` KOSAKI Motohiro
2012-10-16 23:39 ` KOSAKI Motohiro
2012-10-17 0:12 ` David Rientjes
2012-10-17 0:12 ` David Rientjes
2012-10-17 0:31 ` [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps David Rientjes
2012-10-17 0:31 ` David Rientjes
2012-10-17 1:38 ` KOSAKI Motohiro
2012-10-17 1:38 ` KOSAKI Motohiro
2012-10-17 1:49 ` David Rientjes
2012-10-17 1:49 ` David Rientjes
2012-10-17 1:53 ` KOSAKI Motohiro
2012-10-17 1:53 ` KOSAKI Motohiro
2012-10-17 4:05 ` Dave Jones
2012-10-17 4:05 ` Dave Jones
2012-10-17 5:24 ` David Rientjes
2012-10-17 5:24 ` David Rientjes
2012-10-17 5:42 ` Kamezawa Hiroyuki
2012-10-17 5:42 ` Kamezawa Hiroyuki
2012-10-17 8:49 ` KOSAKI Motohiro
2012-10-17 8:49 ` KOSAKI Motohiro
2012-10-17 19:50 ` David Rientjes
2012-10-17 19:50 ` David Rientjes
2012-10-17 21:05 ` KOSAKI Motohiro
2012-10-17 21:05 ` KOSAKI Motohiro
2012-10-17 21:27 ` David Rientjes
2012-10-17 21:27 ` David Rientjes
2012-10-17 18:14 ` Dave Jones
2012-10-17 18:14 ` Dave Jones
2012-10-17 19:21 ` David Rientjes
2012-10-17 19:21 ` David Rientjes
2012-10-17 19:32 ` Dave Jones
2012-10-17 19:32 ` Dave Jones
2012-10-17 19:38 ` David Rientjes
2012-10-17 19:38 ` David Rientjes
2012-10-17 19:45 ` Dave Jones
2012-10-17 19:45 ` Dave Jones
2012-10-17 20:28 ` [patch for-3.7] mm, mempolicy: avoid taking mutex inside spinlock when reading numa_maps David Rientjes
2012-10-17 20:28 ` David Rientjes
2012-10-17 21:31 ` [patch for-3.7 v2] " David Rientjes
2012-10-17 21:31 ` David Rientjes
2012-10-18 4:06 ` Kamezawa Hiroyuki
2012-10-18 4:06 ` Kamezawa Hiroyuki
2012-10-18 4:14 ` Linus Torvalds
2012-10-18 4:14 ` Linus Torvalds
2012-10-18 4:41 ` Kamezawa Hiroyuki
2012-10-18 4:41 ` Kamezawa Hiroyuki
2012-10-18 4:34 ` Kamezawa Hiroyuki
2012-10-18 4:34 ` Kamezawa Hiroyuki
2012-10-18 20:03 ` David Rientjes
2012-10-18 20:03 ` David Rientjes
2012-10-19 8:35 ` [patch for-3.7 v3] mm, mempolicy: hold task->mempolicy refcount while " Kamezawa Hiroyuki
2012-10-19 8:35 ` Kamezawa Hiroyuki
2012-10-19 9:28 ` David Rientjes
2012-10-19 9:28 ` David Rientjes
2012-10-22 2:47 ` Kamezawa Hiroyuki
2012-10-22 2:47 ` Kamezawa Hiroyuki
2012-10-22 20:55 ` Andrew Morton
2012-10-22 20:55 ` Andrew Morton
2012-10-22 20:56 ` David Rientjes
2012-10-22 20:56 ` David Rientjes
2012-10-19 19:15 ` KOSAKI Motohiro
2012-10-19 19:15 ` KOSAKI Motohiro
2012-10-19 6:51 ` [patch for-3.7 v2] mm, mempolicy: avoid taking mutex inside spinlock when " KOSAKI Motohiro
2012-10-19 6:51 ` KOSAKI Motohiro
2012-10-18 4:35 ` David Rientjes
2012-10-18 4:35 ` David Rientjes
2012-10-24 23:30 ` [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps Sasha Levin
2012-10-24 23:30 ` Sasha Levin
2012-10-24 23:34 ` David Rientjes
2012-10-24 23:34 ` David Rientjes
2012-10-24 23:37 ` Sasha Levin
2012-10-24 23:37 ` Sasha Levin
2012-10-25 0:08 ` David Rientjes
2012-10-25 0:08 ` David Rientjes
2012-10-25 0:54 ` KOSAKI Motohiro
2012-10-25 0:54 ` KOSAKI Motohiro
2012-10-25 1:15 ` David Rientjes
2012-10-25 1:15 ` David Rientjes
2012-10-25 12:19 ` Peter Zijlstra
2012-10-25 12:19 ` Peter Zijlstra
2012-10-25 14:39 ` Peter Zijlstra [this message]
2012-10-25 14:39 ` Peter Zijlstra
2012-10-25 17:23 ` Sasha Levin
2012-10-25 17:23 ` Sasha Levin
2012-10-25 20:22 ` David Rientjes
2012-10-25 20:22 ` David Rientjes
2012-10-25 23:09 ` Linus Torvalds
2012-10-25 23:09 ` Linus Torvalds
2012-10-26 8:48 ` Peter Zijlstra
2012-10-26 8:48 ` Peter Zijlstra
2012-10-31 18:29 ` Sasha Levin
2012-10-31 18:29 ` Sasha Levin
2012-11-21 0:59 ` Sasha Levin
2012-11-21 0:59 ` Sasha Levin
2012-10-17 1:33 ` mpol_to_str revisited KOSAKI Motohiro
2012-10-17 1:33 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1351175972.12171.14.camel@twins \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=bhutchings@solarflare.com \
--cc=davej@redhat.com \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=khlebnikov@openvz.org \
--cc=kosaki.motohiro@gmail.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.