* [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2))
@ 2010-05-04 10:53 Miao Xie
0 siblings, 0 replies; only message in thread
From: Miao Xie @ 2010-05-04 10:53 UTC (permalink / raw)
To: David Rientjes, Nick Piggin, Paul Menage, Lee Schermerhorn
Cc: Andrew Morton, Linux-Kernel, Linux-MM
[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]
Nick Piggin reported that the allocator may see an empty nodemask when changing
cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t
stores. (MAX_NUMNODES > BITS_PER_LONG)
But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free memory.
The reason is like this:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
<nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh
several hours later, oom will happen though there is a lot of free memory.
This patchset fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a
variable to tell the write-side task that read-side task is reading nodemask,
and the write-side task clears newly disallowed nodes after read-side task ends
the current memory allocation.
Changelog since V1:
- restructure the mempolicy's rebind functions, and split the rebind work to
two steps because the rebind functions may breaks the first step - expanding
the nodes range.
Thanks
Miao
[1] http://lkml.org/lkml/2010/2/18/111
[PATCH 1/2] mempolicy: restructure rebinding-mempolicy functions
[PATCH 2/2] cpuset,mm: fix no node to alloc memory when changing cpuset's mems
[-- Attachment #2: reproduce_prog.tar.gz --]
[-- Type: application/gzip, Size: 1190 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2010-05-04 10:53 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04 10:53 [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)) Miao Xie
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.