From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>,
Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2))
Date: Tue, 04 May 2010 18:53:56 +0800 [thread overview]
Message-ID: <4BDFFCC4.5000106@cn.fujitsu.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]
Nick Piggin reported that the allocator may see an empty nodemask when changing
cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t
stores. (MAX_NUMNODES > BITS_PER_LONG)
But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free memory.
The reason is like this:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
<nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh
several hours later, oom will happen though there is a lot of free memory.
This patchset fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a
variable to tell the write-side task that read-side task is reading nodemask,
and the write-side task clears newly disallowed nodes after read-side task ends
the current memory allocation.
Changelog since V1:
- restructure the mempolicy's rebind functions, and split the rebind work to
two steps because the rebind functions may breaks the first step - expanding
the nodes range.
Thanks
Miao
[1] http://lkml.org/lkml/2010/2/18/111
[PATCH 1/2] mempolicy: restructure rebinding-mempolicy functions
[PATCH 2/2] cpuset,mm: fix no node to alloc memory when changing cpuset's mems
[-- Attachment #2: reproduce_prog.tar.gz --]
[-- Type: application/gzip, Size: 1190 bytes --]
reply other threads:[~2010-05-04 10:53 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BDFFCC4.5000106@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.