public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: <wang.yi59@zte.com.cn>
To: <yury.norov@gmail.com>
Cc: <andriy.shevchenko@linux.intel.com>, <linux@rasmusvillemoes.dk>,
	<linux-kernel@vger.kernel.org>, <xue.zhihong@zte.com.cn>,
	<wang.liang82@zte.com.cn>, <Liu.Jianjun3@zte.com.cn>
Subject: Re:[PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask()
Date: Tue, 14 Jun 2022 11:45:33 +0800 (CST)	[thread overview]
Message-ID: <202206141145339651323@zte.com.cn> (raw)
In-Reply-To: <CAAH8bW8wD_hsOqtWa-g_1SNWNi7GHzsu9RvL8feY069JPKFWBA@mail.gmail.com>

Hi Yury,                                                                                                                                                                                                            
                                                                                                                                                                                                                    
Thanks for your quick and clear response!                                                                                                                                                                           
                                                                                                                                                                                                                    
> On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote:                                                                                                                                             
> >                                                                                                                                                                                                                 
> > Consider one situation:                                                                                                                                                                                         
> >                                     
> > The app have two vmas which mbind() to node 1 and node3 respectively,
> > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according
> > to current bitmap_remap(), we got:                         
>                                                       
> Regarding the original problem - can you please confirm that
> it's reproduced on current kernels, show the execution path etc.
> From what I see on modern kernel, the only user of nodes_remap()
> is mpol_rebind_nodemask(). Is that the correct path?       
                                                                
Yes, it's mpol_rebind_nodemask() calls nodes_remap() from
mpol_rebind_policy(). The stacks are as follow:     
[  290.836747]  bitmap_remap+0x84/0xe0
[  290.836753]  mpol_rebind_nodemask+0x64/0x2a0
[  290.836764]  mpol_rebind_mm+0x3a/0x90
[  290.836768]  update_tasks_nodemask+0x8a/0x1e0
[  290.836774]  cpuset_write_resmask+0x563/0xa00
[  290.836780]  cgroup_file_write+0x81/0x150
[  290.836784]  kernfs_fop_write_iter+0x12d/0x1c0
[  290.836791]  new_sync_write+0x109/0x190
[  290.836800]  vfs_write+0x218/0x2a0
[  290.836809]  ksys_write+0x59/0xd0
[  290.836812]  do_syscall_64+0x37/0x80
[  290.836818]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
                                          
To reproduce this situation, I write a program which seems like this:
    unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS;
    unsigned long size = 262144 << 12;                            
    unsigned long node1 = 2;   // node 1
    unsigned long node2 = 8;   // node 3
                                                                         
    p1 = vma1 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
    p2 = vma2 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
                       
    assert(!mbind(vma1, size, MPOL_BIND, &node1, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE));
    assert(!mbind(vma2, size, MPOL_BIND, &node2, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE));
                                                                      
Start the program whos name is mbind_tester, and do follow steps:
                                                                 
        mkdir && cd /sys/fs/cgroup/cpuset/mbind
        echo 0-31 > cpuset.cpus     
        echo 0-3 > cpuset.mems                                   
                                    
        cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w                
        7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4                                                                                                                        
        7ff77e200000 bind:1 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4

        echo 1,3 > cpuset.mems
        cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w
        7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4
        7ff77e200000 bind:3 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4

As you see, after set cpuset.mems to 1,3, the nodes which one of vma
binded to changed from 1 to 3.

This maybe confused, the original nodes binded is 1, after modify
cpuset.mems to 1,3 which include the node 3, it changed to 3...

>
> Anyways, as per name, bitmap_remap() is intended to change bit
> positions, and it doesn't look wrong if it does so.
>
> This is not how the function is supposed to work. For example,
>         old: 00111000
>         new: 00011100
>
> means:
>         old: 00111 000
>              || \\\|||
>         new: 000 11100
>
> And after this patch it would be:
>         old: 001 11000
>              || \|||||
>         new: 000 11100
>
> Which is not the same, right?

Right.

Actually this is what makes me embarrassed. If we want to fix this
situtation, we can:

- change the bitmap_remap() as this patch did, but this changed the
behavior of this routine which looks does the right thing. One good
news is this function is only called by mpol_rebind_nodemask().

- don't change the bitmap_remap(), to be honest, I didn't figure out
a way. Any suggestions?

>
> If mpol_rebind() wants to keep previous relations, then according to
> the comment:
>  * The positions of unset bits in @old are mapped to themselves
>  * (the identify map).
>
> , you can just clear @old bits that already have good relations
> you'd like to preserve.

Actually this does not work for me :)

In the example above, if set cpuset.mems to 0,2 firstly, the nodes
binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will
change from 2 to 3 again.


---
Best wishes
Yi Wang

  reply	other threads:[~2022-06-14  3:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 11:29 [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() Yi Wang
2022-06-13 19:20 ` Yury Norov
2022-06-14  3:45   ` wang.yi59 [this message]
2022-06-14 16:40     ` Yury Norov
2022-06-14 16:50       ` Andy Shevchenko
2022-06-15  8:54       ` wang.yi59

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202206141145339651323@zte.com.cn \
    --to=wang.yi59@zte.com.cn \
    --cc=Liu.Jianjun3@zte.com.cn \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=wang.liang82@zte.com.cn \
    --cc=xue.zhihong@zte.com.cn \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox