Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Miao Xie <miaox@cn.fujitsu.com>
To: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Paul Menage <menage@google.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Li Zefan <lizf@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Wed, 31 Mar 2010 17:54:34 +0800	[thread overview]
Message-ID: <4BB31BDA.8080203@cn.fujitsu.com> (raw)
In-Reply-To: <20100311110317.GL5812@laptop>

on 2010-3-11 19:03, Nick Piggin wrote:
>> Ok, I try to make a new patch by using seqlock.
> 
> Well... I do think seqlocks would be a bit simpler because they don't
> require this checking and synchronizing of this patch.
> 
> But you are right: on non-x86 architectures seqlocks would probably be
> more costly than your patch in the fastpaths. Unless you can avoid
> using the seqlock in fastpaths and just have callers handle the rare
> case of an empty nodemask.
> 
> cpuset_node_allowed_*wall doesn't need anything because it is just
> interested in one bit in the mask.
> 
> cpuset_mem_spread_node doesn't matter because it will loop around and
> try again if it doesn't find any nodes online.
> 
> cpuset_mems_allowed seems totally broken anyway
> 
> etc.
> 
> This approach might take a little more work, but I think it might be the
> best way. 

Hi, Nick Piggin

The following is the new patch made by your idea. Could you review it?

Thanks!
Miao
---
 kernel/cpuset.c |   11 +++++++++--
 mm/mempolicy.c  |   20 +++++++++++++++++---
 mm/mmzone.c     |   15 +++++++++++----
 3 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index d109467..fbb1f1c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -952,8 +952,6 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 static void cpuset_change_task_nodemask(struct task_struct *tsk,
 					nodemask_t *newmems)
 {
-	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
-	mpol_rebind_task(tsk, &tsk->mems_allowed);
 	mpol_rebind_task(tsk, newmems);
 	tsk->mems_allowed = *newmems;
 }
@@ -2442,6 +2440,15 @@ int cpuset_mem_spread_node(void)
 	node = next_node(current->cpuset_mem_spread_rotor, current->mems_allowed);
 	if (node == MAX_NUMNODES)
 		node = first_node(current->mems_allowed);
+
+	/*
+	 * if node is still MAX_NUMNODES, that means the kernel allocator saw
+	 * an empty nodemask. In that case, the kernel allocator allocate
+	 * memory on the current node.
+	 */
+	if (unlikely(node == MAX_NUMNODES))
+		node = numa_node_id();
+
 	current->cpuset_mem_spread_rotor = node;
 	return node;
 }
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8034abd..75e819e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1449,8 +1449,16 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy)
 		 * the first node in the mask instead.
 		 */
 		if (unlikely(gfp & __GFP_THISNODE) &&
-				unlikely(!node_isset(nd, policy->v.nodes)))
+				unlikely(!node_isset(nd, policy->v.nodes))) {
 			nd = first_node(policy->v.nodes);
+			/*
+			 * When rebinding task->mempolicy, th kernel allocator
+			 * may see an empty nodemask, and first_node() returns
+			 * MAX_NUMNODES, In that case, we will use current node.
+			 */
+			if (unlikely(nd == MAX_NUMNODES))
+				nd = numa_node_id();
+		}
 		break;
 	case MPOL_INTERLEAVE: /* should not happen */
 		break;
@@ -1522,17 +1530,21 @@ unsigned slab_node(struct mempolicy *policy)
 static unsigned offset_il_node(struct mempolicy *pol,
 		struct vm_area_struct *vma, unsigned long off)
 {
-	unsigned nnodes = nodes_weight(pol->v.nodes);
+	nodemask_t tmp_nodes;
+	unsigned nnodes;
 	unsigned target;
 	int c;
 	int nid = -1;
 
+	tmp_nodes = pol->v.nodes;
+	nnodes = nodes_weight(tmp_nodes);
 	if (!nnodes)
 		return numa_node_id();
+
 	target = (unsigned int)off % nnodes;
 	c = 0;
 	do {
-		nid = next_node(nid, pol->v.nodes);
+		nid = next_node(nid, tmp_nodes);
 		c++;
 	} while (c <= target);
 	return nid;
@@ -1631,7 +1643,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
 	case MPOL_BIND:
 		/* Fall through */
 	case MPOL_INTERLEAVE:
+		task_lock(current);
 		*mask =  mempolicy->v.nodes;
+		task_unlock(current);
 		break;
 
 	default:
diff --git a/mm/mmzone.c b/mm/mmzone.c
index f5b7d17..43ac21b 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -58,6 +58,7 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
 					nodemask_t *nodes,
 					struct zone **zone)
 {
+	nodemask_t tmp_nodes;
 	/*
 	 * Find the next suitable zone to use for the allocation.
 	 * Only filter based on nodemask if it's set
@@ -65,10 +66,16 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
 	if (likely(nodes == NULL))
 		while (zonelist_zone_idx(z) > highest_zoneidx)
 			z++;
-	else
-		while (zonelist_zone_idx(z) > highest_zoneidx ||
-				(z->zone && !zref_in_nodemask(z, nodes)))
-			z++;
+	else {
+		tmp_nodes = *nodes;
+		if (nodes_empty(tmp_nodes))
+			while (zonelist_zone_idx(z) > highest_zoneidx)
+				z++;
+		else
+			while (zonelist_zone_idx(z) > highest_zoneidx ||
+				(z->zone && !zref_in_nodemask(z, &tmp_nodes)))
+				z++;
+	}
 
 	*zone = zonelist_zone(z);
 	return z;
-- 
1.6.5.2

WARNING: multiple messages have this Message-ID (diff)

From: Miao Xie <miaox@cn.fujitsu.com>
To: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Paul Menage <menage@google.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Li Zefan <lizf@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Wed, 31 Mar 2010 17:54:34 +0800	[thread overview]
Message-ID: <4BB31BDA.8080203@cn.fujitsu.com> (raw)
In-Reply-To: <20100311110317.GL5812@laptop>

on 2010-3-11 19:03, Nick Piggin wrote:
>> Ok, I try to make a new patch by using seqlock.
> 
> Well... I do think seqlocks would be a bit simpler because they don't
> require this checking and synchronizing of this patch.
> 
> But you are right: on non-x86 architectures seqlocks would probably be
> more costly than your patch in the fastpaths. Unless you can avoid
> using the seqlock in fastpaths and just have callers handle the rare
> case of an empty nodemask.
> 
> cpuset_node_allowed_*wall doesn't need anything because it is just
> interested in one bit in the mask.
> 
> cpuset_mem_spread_node doesn't matter because it will loop around and
> try again if it doesn't find any nodes online.
> 
> cpuset_mems_allowed seems totally broken anyway
> 
> etc.
> 
> This approach might take a little more work, but I think it might be the
> best way. 

Hi, Nick Piggin

The following is the new patch made by your idea. Could you review it?

Thanks!
Miao
---
 kernel/cpuset.c |   11 +++++++++--
 mm/mempolicy.c  |   20 +++++++++++++++++---
 mm/mmzone.c     |   15 +++++++++++----
 3 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index d109467..fbb1f1c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -952,8 +952,6 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 static void cpuset_change_task_nodemask(struct task_struct *tsk,
 					nodemask_t *newmems)
 {
-	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
-	mpol_rebind_task(tsk, &tsk->mems_allowed);
 	mpol_rebind_task(tsk, newmems);
 	tsk->mems_allowed = *newmems;
 }
@@ -2442,6 +2440,15 @@ int cpuset_mem_spread_node(void)
 	node = next_node(current->cpuset_mem_spread_rotor, current->mems_allowed);
 	if (node == MAX_NUMNODES)
 		node = first_node(current->mems_allowed);
+
+	/*
+	 * if node is still MAX_NUMNODES, that means the kernel allocator saw
+	 * an empty nodemask. In that case, the kernel allocator allocate
+	 * memory on the current node.
+	 */
+	if (unlikely(node == MAX_NUMNODES))
+		node = numa_node_id();
+
 	current->cpuset_mem_spread_rotor = node;
 	return node;
 }
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8034abd..75e819e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1449,8 +1449,16 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy)
 		 * the first node in the mask instead.
 		 */
 		if (unlikely(gfp & __GFP_THISNODE) &&
-				unlikely(!node_isset(nd, policy->v.nodes)))
+				unlikely(!node_isset(nd, policy->v.nodes))) {
 			nd = first_node(policy->v.nodes);
+			/*
+			 * When rebinding task->mempolicy, th kernel allocator
+			 * may see an empty nodemask, and first_node() returns
+			 * MAX_NUMNODES, In that case, we will use current node.
+			 */
+			if (unlikely(nd == MAX_NUMNODES))
+				nd = numa_node_id();
+		}
 		break;
 	case MPOL_INTERLEAVE: /* should not happen */
 		break;
@@ -1522,17 +1530,21 @@ unsigned slab_node(struct mempolicy *policy)
 static unsigned offset_il_node(struct mempolicy *pol,
 		struct vm_area_struct *vma, unsigned long off)
 {
-	unsigned nnodes = nodes_weight(pol->v.nodes);
+	nodemask_t tmp_nodes;
+	unsigned nnodes;
 	unsigned target;
 	int c;
 	int nid = -1;
 
+	tmp_nodes = pol->v.nodes;
+	nnodes = nodes_weight(tmp_nodes);
 	if (!nnodes)
 		return numa_node_id();
+
 	target = (unsigned int)off % nnodes;
 	c = 0;
 	do {
-		nid = next_node(nid, pol->v.nodes);
+		nid = next_node(nid, tmp_nodes);
 		c++;
 	} while (c <= target);
 	return nid;
@@ -1631,7 +1643,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
 	case MPOL_BIND:
 		/* Fall through */
 	case MPOL_INTERLEAVE:
+		task_lock(current);
 		*mask =  mempolicy->v.nodes;
+		task_unlock(current);
 		break;
 
 	default:
diff --git a/mm/mmzone.c b/mm/mmzone.c
index f5b7d17..43ac21b 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -58,6 +58,7 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
 					nodemask_t *nodes,
 					struct zone **zone)
 {
+	nodemask_t tmp_nodes;
 	/*
 	 * Find the next suitable zone to use for the allocation.
 	 * Only filter based on nodemask if it's set
@@ -65,10 +66,16 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
 	if (likely(nodes == NULL))
 		while (zonelist_zone_idx(z) > highest_zoneidx)
 			z++;
-	else
-		while (zonelist_zone_idx(z) > highest_zoneidx ||
-				(z->zone && !zref_in_nodemask(z, nodes)))
-			z++;
+	else {
+		tmp_nodes = *nodes;
+		if (nodes_empty(tmp_nodes))
+			while (zonelist_zone_idx(z) > highest_zoneidx)
+				z++;
+		else
+			while (zonelist_zone_idx(z) > highest_zoneidx ||
+				(z->zone && !zref_in_nodemask(z, &tmp_nodes)))
+				z++;
+	}
 
 	*zone = zonelist_zone(z);
 	return z;
-- 
1.6.5.2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-03-31  9:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 10:10 [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-08 10:10 ` Miao Xie
2010-03-08 21:46 ` David Rientjes
2010-03-08 21:46   ` David Rientjes
2010-03-09  7:25   ` Miao Xie
2010-03-09  7:25     ` Miao Xie
2010-03-11  8:15     ` Nick Piggin
2010-03-11  8:15       ` Nick Piggin
2010-03-11 10:33       ` Miao Xie
2010-03-11 10:33         ` Miao Xie
2010-03-11 11:03         ` Nick Piggin
2010-03-11 11:03           ` Nick Piggin
2010-03-25 10:23           ` Miao Xie
2010-03-25 10:23             ` Miao Xie
2010-03-25 12:56             ` Miao Xie
2010-03-25 12:56               ` Miao Xie
2010-03-25 13:33           ` [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily) Miao Xie
2010-03-25 13:33             ` Miao Xie
2010-03-28  5:30             ` Bob Liu
2010-03-28  5:30               ` Bob Liu
2010-03-31 19:42             ` Andrew Morton
2010-03-31 19:42               ` Andrew Morton
2010-03-31  9:54           ` Miao Xie [this message]
2010-03-31  9:54             ` [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-31 10:34             ` David Rientjes
2010-03-31 10:34               ` David Rientjes
2010-04-01  2:16               ` Miao Xie
2010-04-01  2:16                 ` Miao Xie

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d109467 dfblob:fbb1f1c dfblob:8034abd dfblob:75e819e
dfblob:f5b7d17 dfblob:43ac21b dfblob:d109467 dfblob:fbb1f1c
dfblob:8034abd dfblob:75e819e dfblob:f5b7d17 dfblob:43ac21b )
 OR (
bs:"Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BB31BDA.8080203@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=npiggin@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.