From: Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
"cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"Hansen,
Dave" <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
"Huang,
Ying" <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
"stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
Date: Wed, 27 Apr 2022 20:09:58 +0800 [thread overview]
Message-ID: <20220427120958.GD84190@shbuild999.sh.intel.com> (raw)
In-Reply-To: <4c6847ba-4c8d-9776-a065-684a8b95130b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Tue, Apr 26, 2022 at 10:34:21PM -0400, Waiman Long wrote:
> On 4/26/22 21:06, Feng Tang wrote:
> > On Tue, Apr 26, 2022 at 10:58:21PM +0800, Waiman Long wrote:
> > > On 4/25/22 23:23, Feng Tang wrote:
> > > > Hi Waiman,
> > > >
> > > > On Mon, Apr 25, 2022 at 11:55:05AM -0400, Waiman Long wrote:
> > > > > There are 3 places where the cpu and node masks of the top cpuset can
> > > > > be initialized in the order they are executed:
> > > > > 1) start_kernel -> cpuset_init()
> > > > > 2) start_kernel -> cgroup_init() -> cpuset_bind()
> > > > > 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp()
> > > > >
> > > > > The first cpuset_init() function just sets all the bits in the masks.
> > > > > The last one executed is cpuset_init_smp() which sets up cpu and node
> > > > > masks suitable for v1, but not v2. cpuset_bind() does the right setup
> > > > > for both v1 and v2.
> > > > >
> > > > > For systems with cgroup v2 setup, cpuset_bind() is called once. For
> > > > > systems with cgroup v1 setup, cpuset_bind() is called twice. It is
> > > > > first called before cpuset_init_smp() in cgroup v2 mode. Then it is
> > > > > called again when cgroup v1 filesystem is mounted in v1 mode after
> > > > > cpuset_init_smp().
> > > > >
> > > > > [ 2.609781] cpuset_bind() called - v2 = 1
> > > > > [ 3.079473] cpuset_init_smp() called
> > > > > [ 7.103710] cpuset_bind() called - v2 = 0
> > > > I run some test, on a server with centOS, this did happen that
> > > > cpuset_bind() is called twice, first as v2 during kernel boot,
> > > > and then as v1 post-boot.
> > > >
> > > > However on a QEMU running with a basic debian rootfs image,
> > > > the second call of cpuset_bind() didn't happen.
> > > The first time cpuset_bind() is called in cgroup_init(), the kernel
> > > doesn't know if userspace is going to mount v1 or v2 cgroup. By default,
> > > it is assumed to be v2. However, if userspace mounts the cgroup v1
> > > filesystem for cpuset, cpuset_bind() will be run at this point by
> > > rebind_subsystem() to set up cgroup v1 environment and
> > > cpus_allowed/mems_allowed will be correctly set at this point. Mounting
> > > the cgroup v2 filesystem, however, does not cause rebind_subsystem() to
> > > run and hence cpuset_bind() is not called again.
> > >
> > > Is the QEMU setup not mounting any cgroup filesystem at all? If so, does
> > > it matter whether v1 or v2 setup is used?
> > When I got the cpuset binding error report, I tried first on qemu to
> > reproduce and failed (due to there was no memory hotplug), then I
> > reproduced it on a real server. For both system, I used "cgroup_no_v1=all"
> > cmdline parameter to test cgroup-v2, could this be the reason? (TBH,
> > this is the first time I use cgroup-v2).
> >
> > Here is the info dump:
> >
> > # mount | grep cgroup
> > tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
> > cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
> >
> > #cat /proc/filesystems | grep cgroup
> > nodev cgroup
> > nodev cgroup2
> >
> > Thanks,
> > Feng
>
> For cgroup v2, cpus_allowed should be set to cpu_possible_mask and
> mems_allowed to node_possible_map as is done in the first invocation of
> cpuset_bind(). That is the correct behavior.
OK. For the cgroup v2 mem binding problem with hot-added nodes, I
retested today, and it can't be reproduced with this patch. So feel
free to add:
Tested-by: Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Thanks,
Feng
> Cheers,
> Longman
>
next prev parent reply other threads:[~2022-04-27 12:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-25 15:55 [PATCH v2] cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() Waiman Long
[not found] ` <20220425155505.1292896-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-04-26 3:23 ` Feng Tang
2022-04-26 14:58 ` Waiman Long
[not found] ` <be293d58-1084-b586-2267-6a1e6a400762-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-04-27 1:06 ` Feng Tang
[not found] ` <20220427010654.GC84190-H4oSubS0T0ZnxsEdoo2cZdh3ngVCH38I@public.gmane.org>
2022-04-27 2:34 ` Waiman Long
[not found] ` <4c6847ba-4c8d-9776-a065-684a8b95130b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-04-27 12:09 ` Feng Tang [this message]
2022-04-27 13:53 ` Michal Koutný
2022-04-27 14:33 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220427120958.GD84190@shbuild999.sh.intel.com \
--to=feng.tang-ral2jqcrhueavxtiumwx3w@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox