From: <wang.yi59@zte.com.cn>
To: <yury.norov@gmail.com>
Cc: <andriy.shevchenko@linux.intel.com>, <linux@rasmusvillemoes.dk>,
<linux-kernel@vger.kernel.org>, <xue.zhihong@zte.com.cn>,
<wang.liang82@zte.com.cn>, <Liu.Jianjun3@zte.com.cn>
Subject: Re:[PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask()
Date: Tue, 14 Jun 2022 11:45:33 +0800 (CST) [thread overview]
Message-ID: <202206141145339651323@zte.com.cn> (raw)
In-Reply-To: <CAAH8bW8wD_hsOqtWa-g_1SNWNi7GHzsu9RvL8feY069JPKFWBA@mail.gmail.com>
Hi Yury,
Thanks for your quick and clear response!
> On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote:
> >
> > Consider one situation:
> >
> > The app have two vmas which mbind() to node 1 and node3 respectively,
> > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according
> > to current bitmap_remap(), we got:
>
> Regarding the original problem - can you please confirm that
> it's reproduced on current kernels, show the execution path etc.
> From what I see on modern kernel, the only user of nodes_remap()
> is mpol_rebind_nodemask(). Is that the correct path?
Yes, it's mpol_rebind_nodemask() calls nodes_remap() from
mpol_rebind_policy(). The stacks are as follow:
[ 290.836747] bitmap_remap+0x84/0xe0
[ 290.836753] mpol_rebind_nodemask+0x64/0x2a0
[ 290.836764] mpol_rebind_mm+0x3a/0x90
[ 290.836768] update_tasks_nodemask+0x8a/0x1e0
[ 290.836774] cpuset_write_resmask+0x563/0xa00
[ 290.836780] cgroup_file_write+0x81/0x150
[ 290.836784] kernfs_fop_write_iter+0x12d/0x1c0
[ 290.836791] new_sync_write+0x109/0x190
[ 290.836800] vfs_write+0x218/0x2a0
[ 290.836809] ksys_write+0x59/0xd0
[ 290.836812] do_syscall_64+0x37/0x80
[ 290.836818] entry_SYSCALL_64_after_hwframe+0x46/0xb0
To reproduce this situation, I write a program which seems like this:
unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS;
unsigned long size = 262144 << 12;
unsigned long node1 = 2; // node 1
unsigned long node2 = 8; // node 3
p1 = vma1 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
p2 = vma2 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
assert(!mbind(vma1, size, MPOL_BIND, &node1, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE));
assert(!mbind(vma2, size, MPOL_BIND, &node2, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE));
Start the program whos name is mbind_tester, and do follow steps:
mkdir && cd /sys/fs/cgroup/cpuset/mbind
echo 0-31 > cpuset.cpus
echo 0-3 > cpuset.mems
cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w
7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4
7ff77e200000 bind:1 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4
echo 1,3 > cpuset.mems
cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w
7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4
7ff77e200000 bind:3 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4
As you see, after set cpuset.mems to 1,3, the nodes which one of vma
binded to changed from 1 to 3.
This maybe confused, the original nodes binded is 1, after modify
cpuset.mems to 1,3 which include the node 3, it changed to 3...
>
> Anyways, as per name, bitmap_remap() is intended to change bit
> positions, and it doesn't look wrong if it does so.
>
> This is not how the function is supposed to work. For example,
> old: 00111000
> new: 00011100
>
> means:
> old: 00111 000
> || \\\|||
> new: 000 11100
>
> And after this patch it would be:
> old: 001 11000
> || \|||||
> new: 000 11100
>
> Which is not the same, right?
Right.
Actually this is what makes me embarrassed. If we want to fix this
situtation, we can:
- change the bitmap_remap() as this patch did, but this changed the
behavior of this routine which looks does the right thing. One good
news is this function is only called by mpol_rebind_nodemask().
- don't change the bitmap_remap(), to be honest, I didn't figure out
a way. Any suggestions?
>
> If mpol_rebind() wants to keep previous relations, then according to
> the comment:
> * The positions of unset bits in @old are mapped to themselves
> * (the identify map).
>
> , you can just clear @old bits that already have good relations
> you'd like to preserve.
Actually this does not work for me :)
In the example above, if set cpuset.mems to 0,2 firstly, the nodes
binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will
change from 2 to 3 again.
---
Best wishes
Yi Wang
next prev parent reply other threads:[~2022-06-14 3:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-13 11:29 [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() Yi Wang
2022-06-13 19:20 ` Yury Norov
2022-06-14 3:45 ` wang.yi59 [this message]
2022-06-14 16:40 ` Yury Norov
2022-06-14 16:50 ` Andy Shevchenko
2022-06-15 8:54 ` wang.yi59
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202206141145339651323@zte.com.cn \
--to=wang.yi59@zte.com.cn \
--cc=Liu.Jianjun3@zte.com.cn \
--cc=andriy.shevchenko@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=wang.liang82@zte.com.cn \
--cc=xue.zhihong@zte.com.cn \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.