All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: miaox@cn.fujitsu.com
Cc: Nick Piggin <npiggin@suse.de>,
	David Rientjes <rientjes@google.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Paul Menage <menage@google.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily)
Date: Wed, 31 Mar 2010 12:42:01 -0700	[thread overview]
Message-ID: <20100331124201.8cb20a11.akpm@linux-foundation.org> (raw)
In-Reply-To: <4BAB6646.7040302@cn.fujitsu.com>

On Thu, 25 Mar 2010 21:33:58 +0800
Miao Xie <miaox@cn.fujitsu.com> wrote:

> on 2010-3-11 19:03, Nick Piggin wrote:
> >> Ok, I try to make a new patch by using seqlock.
> > 
> > Well... I do think seqlocks would be a bit simpler because they don't
> > require this checking and synchronizing of this patch.
> Hi, Nick Piggin
> 
> I have made a new patch which uses seqlock to protect mems_allowed and mempolicy.
> please review it.

That's an awfully big patch for a pretty small bug?

> Subject: [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2)
> 
> Before applying this patch, cpuset updates task->mems_allowed by setting all
> new bits in the nodemask first, and clearing all old unallowed bits later.
> But in the way, the allocator can see an empty nodemask, though it is infrequent.
> 
> The problem is following:
> The size of nodemask_t is greater than the size of long integer, so loading
> and storing of nodemask_t are not atomic operations. If task->mems_allowed
> don't intersect with new_mask, such as the first word of the mask is empty
> and only the first word of new_mask is not empty. When the allocator
> loads a word of the mask before
> 
> 	current->mems_allowed |= new_mask;
> 
> and then loads another word of the mask after
> 
> 	current->mems_allowed = new_mask;
> 
> the allocator gets an empty nodemask.

Probably we could fix this via careful ordering of the updates,
barriers and perhaps some avoicance action at the reader side.

> Besides that, if the size of nodemask_t is less than the size of long integer,
> there is another problem. when the kernel allocater invokes the following function,
> 
> 	struct zoneref *next_zones_zonelist(struct zoneref *z,
> 						enum zone_type highest_zoneidx,
> 						nodemask_t *nodes,
> 						struct zone **zone)
> 	{
> 		/*
> 		 * Find the next suitable zone to use for the allocation.
> 		 * Only filter based on nodemask if it's set
> 		 */
> 		if (likely(nodes == NULL))
> 			......
>  	       else
> 			while (zonelist_zone_idx(z) > highest_zoneidx ||
> 					(z->zone && !zref_in_nodemask(z, nodes)))
> 				z++;
> 
> 		*zone = zonelist_zone(z);
> 		return z;
> 	}
> 
> if we change nodemask between two calls of zref_in_nodemask(), such as
> 	Task1						Task2
> 	zref_in_nodemask(z = node0's z, nodes = 1-2)
> 	zref_in_nodemask return 0
> 							nodes = 0
> 	zref_in_nodemask(z = node1's z, nodes = 0)
> 	zref_in_nodemask return 0
> z will overflow.

And maybe we can fix this by taking a copy into a local.

> when the kernel allocater accesses task->mempolicy, there is the same problem.

And maybe we can fix those in a similar way.

But it's all too much, and we'll just break it again in the future.  So
yup, I guess locking is needed.


WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: miaox@cn.fujitsu.com
Cc: Nick Piggin <npiggin@suse.de>,
	David Rientjes <rientjes@google.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Paul Menage <menage@google.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily)
Date: Wed, 31 Mar 2010 12:42:01 -0700	[thread overview]
Message-ID: <20100331124201.8cb20a11.akpm@linux-foundation.org> (raw)
In-Reply-To: <4BAB6646.7040302@cn.fujitsu.com>

On Thu, 25 Mar 2010 21:33:58 +0800
Miao Xie <miaox@cn.fujitsu.com> wrote:

> on 2010-3-11 19:03, Nick Piggin wrote:
> >> Ok, I try to make a new patch by using seqlock.
> > 
> > Well... I do think seqlocks would be a bit simpler because they don't
> > require this checking and synchronizing of this patch.
> Hi, Nick Piggin
> 
> I have made a new patch which uses seqlock to protect mems_allowed and mempolicy.
> please review it.

That's an awfully big patch for a pretty small bug?

> Subject: [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2)
> 
> Before applying this patch, cpuset updates task->mems_allowed by setting all
> new bits in the nodemask first, and clearing all old unallowed bits later.
> But in the way, the allocator can see an empty nodemask, though it is infrequent.
> 
> The problem is following:
> The size of nodemask_t is greater than the size of long integer, so loading
> and storing of nodemask_t are not atomic operations. If task->mems_allowed
> don't intersect with new_mask, such as the first word of the mask is empty
> and only the first word of new_mask is not empty. When the allocator
> loads a word of the mask before
> 
> 	current->mems_allowed |= new_mask;
> 
> and then loads another word of the mask after
> 
> 	current->mems_allowed = new_mask;
> 
> the allocator gets an empty nodemask.

Probably we could fix this via careful ordering of the updates,
barriers and perhaps some avoicance action at the reader side.

> Besides that, if the size of nodemask_t is less than the size of long integer,
> there is another problem. when the kernel allocater invokes the following function,
> 
> 	struct zoneref *next_zones_zonelist(struct zoneref *z,
> 						enum zone_type highest_zoneidx,
> 						nodemask_t *nodes,
> 						struct zone **zone)
> 	{
> 		/*
> 		 * Find the next suitable zone to use for the allocation.
> 		 * Only filter based on nodemask if it's set
> 		 */
> 		if (likely(nodes == NULL))
> 			......
>  	       else
> 			while (zonelist_zone_idx(z) > highest_zoneidx ||
> 					(z->zone && !zref_in_nodemask(z, nodes)))
> 				z++;
> 
> 		*zone = zonelist_zone(z);
> 		return z;
> 	}
> 
> if we change nodemask between two calls of zref_in_nodemask(), such as
> 	Task1						Task2
> 	zref_in_nodemask(z = node0's z, nodes = 1-2)
> 	zref_in_nodemask return 0
> 							nodes = 0
> 	zref_in_nodemask(z = node1's z, nodes = 0)
> 	zref_in_nodemask return 0
> z will overflow.

And maybe we can fix this by taking a copy into a local.

> when the kernel allocater accesses task->mempolicy, there is the same problem.

And maybe we can fix those in a similar way.

But it's all too much, and we'll just break it again in the future.  So
yup, I guess locking is needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-03-31 19:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 10:10 [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-08 10:10 ` Miao Xie
2010-03-08 21:46 ` David Rientjes
2010-03-08 21:46   ` David Rientjes
2010-03-09  7:25   ` Miao Xie
2010-03-09  7:25     ` Miao Xie
2010-03-11  8:15     ` Nick Piggin
2010-03-11  8:15       ` Nick Piggin
2010-03-11 10:33       ` Miao Xie
2010-03-11 10:33         ` Miao Xie
2010-03-11 11:03         ` Nick Piggin
2010-03-11 11:03           ` Nick Piggin
2010-03-25 10:23           ` Miao Xie
2010-03-25 10:23             ` Miao Xie
2010-03-25 12:56             ` Miao Xie
2010-03-25 12:56               ` Miao Xie
2010-03-25 13:33           ` [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily) Miao Xie
2010-03-25 13:33             ` Miao Xie
2010-03-28  5:30             ` Bob Liu
2010-03-28  5:30               ` Bob Liu
2010-03-31 19:42             ` Andrew Morton [this message]
2010-03-31 19:42               ` Andrew Morton
2010-03-31  9:54           ` [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-31  9:54             ` Miao Xie
2010-03-31 10:34             ` David Rientjes
2010-03-31 10:34               ` David Rientjes
2010-04-01  2:16               ` Miao Xie
2010-04-01  2:16                 ` Miao Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100331124201.8cb20a11.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=miaox@cn.fujitsu.com \
    --cc=npiggin@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.