All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Miao Xie <miaox@cn.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	riel@redhat.com
Subject: Re: [PATCH] cpuset: mm: Reduce large amounts of memory barrier related damage v3
Date: Thu, 29 Aug 2013 13:14:19 +0200	[thread overview]
Message-ID: <20130829111419.GA10002@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130829105656.GD22421@suse.de>

On Thu, Aug 29, 2013 at 11:56:57AM +0100, Mel Gorman wrote:
> > I thought it was, we crashed somewhere suspiciously close, but no. You
> > need shared mpols for this to actually trigger and the NUMA stuff
> > doesn't use that.
> > 
> 
> Ah, so this is a red herring?

Yeah, but I still think its an actual bug. It seems the easiest way to
trigger this would be to:

 create a task that constantly allocates pages
 have said task have an MPOL_INTERLEAVE task policy
 put said task into a cpuset
 using a different task (your shell for example) flip the cpuset's
   mems_allowed back and forth.

This would have the shell task constantly rebind (in two steps) our
allocating task's INTERLEAVE policy.

> > I used whatever nodemask.h did to detect end-of-bitmap and they use
> > MAX_NUMNODES. See __next_node() and for_each_node() like.
> > 
> 
> The check does prevent us going off the end of the bitmap but does not
> necessarily return an online node.

Right, but its guaranteed to return a 'valid' node. I don't think it
returning an offline node is a problem, we'll find it empty and fail the
page allocation.

> > MAX_NUMNODES doesn't assume contiguous numbers since its the actual size
> > of the bitmap, nr_online_nodes would hoever.
> > 
> 
> I intended to say nr_node_ids, the same size as buffers such as the
> task_numa_buffers. If we ever return a nid > nr_node_ids here then
> task_numa_fault would corrupt memory. However, it should be possible for
> node_weight to exceed nr_node_ids except maybe during node hot-remove so
> it's not the problem.

The nodemask situation seems somewhat more confused than the cpumask
case; how would we ever return a nid > nr_node_ids? Corrupt nodemask?

In the cpumask case we use the runtime limit nr_cpu_ids for all bitmap
operations, arguably we should make the nodemask stuff do the same.

Less bits to iterate is always good; a MAX_NUMNODES=64
(x86_64-defconfig) will still iterate all 64 bits, even though its
unlikely to have more than 1 let alone more than 8 nodes.

> > So I explicitly didn't use the node_isset() test because that's more
> > likely to trigger than the nid >= MAX_NUMNODES test. Its fine to return
> > a node that isn't actually part of the mask anymore -- a race is a race
> > anyway.
> 
> Yeah and as long as it's < nr_node_ids it should be ok within the task
> numa fault handling as well.

Right, I'm just a tad confused on how we could ever get a nid >=
nr_node_ids except from a prior bug (corrupted nodemask).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Miao Xie <miaox@cn.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	riel@redhat.com
Subject: Re: [PATCH] cpuset: mm: Reduce large amounts of memory barrier related damage v3
Date: Thu, 29 Aug 2013 13:14:19 +0200	[thread overview]
Message-ID: <20130829111419.GA10002@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130829105656.GD22421@suse.de>

On Thu, Aug 29, 2013 at 11:56:57AM +0100, Mel Gorman wrote:
> > I thought it was, we crashed somewhere suspiciously close, but no. You
> > need shared mpols for this to actually trigger and the NUMA stuff
> > doesn't use that.
> > 
> 
> Ah, so this is a red herring?

Yeah, but I still think its an actual bug. It seems the easiest way to
trigger this would be to:

 create a task that constantly allocates pages
 have said task have an MPOL_INTERLEAVE task policy
 put said task into a cpuset
 using a different task (your shell for example) flip the cpuset's
   mems_allowed back and forth.

This would have the shell task constantly rebind (in two steps) our
allocating task's INTERLEAVE policy.

> > I used whatever nodemask.h did to detect end-of-bitmap and they use
> > MAX_NUMNODES. See __next_node() and for_each_node() like.
> > 
> 
> The check does prevent us going off the end of the bitmap but does not
> necessarily return an online node.

Right, but its guaranteed to return a 'valid' node. I don't think it
returning an offline node is a problem, we'll find it empty and fail the
page allocation.

> > MAX_NUMNODES doesn't assume contiguous numbers since its the actual size
> > of the bitmap, nr_online_nodes would hoever.
> > 
> 
> I intended to say nr_node_ids, the same size as buffers such as the
> task_numa_buffers. If we ever return a nid > nr_node_ids here then
> task_numa_fault would corrupt memory. However, it should be possible for
> node_weight to exceed nr_node_ids except maybe during node hot-remove so
> it's not the problem.

The nodemask situation seems somewhat more confused than the cpumask
case; how would we ever return a nid > nr_node_ids? Corrupt nodemask?

In the cpumask case we use the runtime limit nr_cpu_ids for all bitmap
operations, arguably we should make the nodemask stuff do the same.

Less bits to iterate is always good; a MAX_NUMNODES=64
(x86_64-defconfig) will still iterate all 64 bits, even though its
unlikely to have more than 1 let alone more than 8 nodes.

> > So I explicitly didn't use the node_isset() test because that's more
> > likely to trigger than the nid >= MAX_NUMNODES test. Its fine to return
> > a node that isn't actually part of the mask anymore -- a race is a race
> > anyway.
> 
> Yeah and as long as it's < nr_node_ids it should be ok within the task
> numa fault handling as well.

Right, I'm just a tad confused on how we could ever get a nid >=
nr_node_ids except from a prior bug (corrupted nodemask).

  reply	other threads:[~2013-08-29 11:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-07 18:08 [PATCH] cpuset: mm: Reduce large amounts of memory barrier related damage v3 Mel Gorman
2012-03-07 18:08 ` Mel Gorman
2012-03-26 10:56 ` Peter Zijlstra
2012-03-26 10:56   ` Peter Zijlstra
2012-03-26 11:07   ` Peter Zijlstra
2012-03-26 11:07     ` Peter Zijlstra
2012-03-26 15:50   ` Mel Gorman
2012-03-26 15:50     ` Mel Gorman
2012-03-26 16:20     ` Peter Zijlstra
2012-03-26 16:20       ` Peter Zijlstra
2012-03-27 12:47       ` Mel Gorman
2012-03-27 12:47         ` Mel Gorman
2012-03-27 13:14         ` [PATCH] mm: Optimize put_mems_allowed() usage Peter Zijlstra
2012-03-27 13:14           ` Peter Zijlstra
2012-05-17 10:33           ` Peter Zijlstra
2012-05-17 10:33             ` Peter Zijlstra
2012-05-17 20:16           ` Andrew Morton
2012-05-17 20:16             ` Andrew Morton
2012-05-17 20:23             ` Peter Zijlstra
2012-05-17 20:23               ` Peter Zijlstra
2012-05-18 10:20   ` [tip:sched/numa] " tip-bot for Peter Zijlstra
2013-08-23 13:03 ` [PATCH] cpuset: mm: Reduce large amounts of memory barrier related damage v3 Peter Zijlstra
2013-08-23 13:03   ` Peter Zijlstra
2013-08-23 18:15   ` Peter Zijlstra
2013-08-23 18:15     ` Peter Zijlstra
2013-08-26  5:32     ` Rik van Riel
2013-08-26  5:32       ` Rik van Riel
2013-08-29  9:28     ` Mel Gorman
2013-08-29  9:28       ` Mel Gorman
2013-08-29  9:43       ` Peter Zijlstra
2013-08-29  9:43         ` Peter Zijlstra
2013-08-29  9:45         ` Peter Zijlstra
2013-08-29  9:45           ` Peter Zijlstra
2013-08-29 10:56         ` Mel Gorman
2013-08-29 10:56           ` Mel Gorman
2013-08-29 11:14           ` Peter Zijlstra [this message]
2013-08-29 11:14             ` Peter Zijlstra
2013-08-29 12:10             ` Mel Gorman
2013-08-29 12:10               ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130829111419.GA10002@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=miaox@cn.fujitsu.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.