From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Kernel.org Bugbot"
<bugbot-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
bugs-cunTk1MwBs/YUNznpcFYbw@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org
Subject: Re: When processes are forked using clone3 to a cgroup in cgroup v2 with a specified cpuset.cpus, the cpuset.cpus doesn't take an effect to the new processes
Date: Tue, 11 Apr 2023 11:37:40 -0400 [thread overview]
Message-ID: <490db90c-6afd-d934-4cd2-2722579f377d@redhat.com> (raw)
In-Reply-To: <20230411-b217305c0-44d643ccee27-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org>
On 4/11/23 11:04, Kernel.org Bugbot wrote:
> tcao34 writes via Kernel.org Bugzilla:
>
> When using Linux Kernel 6.0 or 6.3-rc5, we found an issue related to clone3 and cpuset subsystem of cgroup v2. When I'm trying to use clone3 with flags "CLONE_INTO_CGROUP" to clone a process into a cgroup, the cpuset.cpus of the cgroup doesn't take an effect to the new processes.
This is a known issue and have been reported before. An upstream patch
to fix this problem is being discussed [1].
[1]
https://lore.kernel.org/lkml/20230411133601.2969636-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org/
Cheers,
Longman
>
> Reproduce
> ==============
> 1) I'm using kernel 6.0 and kernel 6.3-rc5. When booting the kernel, I add the command "cgroup_no_v1=all" to disable cgroup v1.
>
> 2) We create a cgroup named 't0' and set cpuset.cpus as the first cpu:
>
> echo '+cpuset' > /sys/fs/cgroup/cgroup.subtree_control
> mkdir /sys/fs/cgroup/t0
> echo 0 > /sys/fs/cgroup/t0/cpuset.cpus
>
> 2) we run the belowing c program, in which we use clone3 system call to clone 9 processes into cgroup 't0':
>
> #define _GNU_SOURCE
>
> #include <time.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <sys/syscall.h>
> #include <sys/wait.h>
> #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
>
> #define __aligned_u64 uint64_t __attribute__((aligned(8)))
>
> int dirfd_open_opath(const char *dir)
> {
> return open(dir, O_RDONLY | O_PATH);
> }
>
> struct __clone_args {
> __aligned_u64 flags;
> __aligned_u64 pidfd;
> __aligned_u64 child_tid;
> __aligned_u64 parent_tid;
> __aligned_u64 exit_signal;
> __aligned_u64 stack;
> __aligned_u64 stack_size;
> __aligned_u64 tls;
> __aligned_u64 set_tid;
> __aligned_u64 set_tid_size;
> __aligned_u64 cgroup;
> };
>
> pid_t clone_into_cgroup(int cgroup_fd)
> {
> pid_t pid;
> struct __clone_args args = {
> .flags = CLONE_INTO_CGROUP,
> .exit_signal = SIGCHLD,
> .cgroup = cgroup_fd,
> };
> pid = syscall(SYS_clone3, &args, sizeof(struct __clone_args));
>
> if (pid < 0)
> return -1;
>
> return pid;
> }
>
>
> int main(int argc, char *argv[]) {
> int i, n = 9;
> int status = 0;
> pid_t pids[9];
> pid_t wpid;
> char cgname[100] = "/sys/fs/cgroup/t0";
> int cgroup_fd;
>
> for (i = 0; i < n; ++i) {
> cgroup_fd = dirfd_open_opath(cgname);
> pids[i] = clone_into_cgroup(cgroup_fd);
> close(cgroup_fd);
> if (pids[i] < 0) {
> perror("fork");
> abort();
> } else if (pids[i] == 0) {
> printf("fork successfully %d\n", getppid());
> while(1);
> }
> }
> while ((wpid = wait(&status)) > 0);
>
> }
>
> 3) Use 'ps' command, we get the pids of the new forked processes are: 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824
>
> 4) When we call "cat /sys/fs/cgroup/t0/cgroup.procs", the results show that all new forked processes are attached to the cgroup 't0':
> root@node0:/sys/fs/cgroup/t0# cat /sys/fs/cgroup/t0/cgroup.procs
> 1816
> 1817
> 1818
> 1819
> 1820
> 1821
> 1822
> 1823
> 1824
>
> 5) However, when we use taskset to check the cpu affinity, all new forked processes are allowed to use all available cpus.
> root@node0:/sys/fs/cgroup/t0# taskset -p 1816
> pid 1816's current affinity mask: ffffffffff
>
> 6) Also, if we check by 'top', each task is using 100% cpu time, rather than 9 tasks share the first cpu.
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1816 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1817 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1818 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1819 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1820 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1821 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1822 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1823 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
> 1824 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
>
> root cause
> ==============
> In $Linux_DIR/kernel/cgroup/cpuset.c, function cpuset_fork works as:
> static void cpuset_fork(struct task_struct *task)
> {
> if (task_css_is_root(task, cpuset_cgrp_id))
> return;
>
> set_cpus_allowed_ptr(task, current->cpus_ptr);
> task->mems_allowed = current->mems_allowed;
> }
>
> It directly set the allowed cpus of the new forked task as the cpus_ptr of current task (aka parent task). However, if we use clone3() to clone a task to a different cgroup, a task still inherits the parent's allowed_cpus rather than the allowed_cpus of the cgroup clone3() specified.
>
> Fix
> ==============
> We add a patch to the commit 148341f0a2f53b5e8808d093333d85170586a15d and it can fix the issue in this senarior.
>
> ---
> kernel/cgroup/cpuset.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 636f1c682ac0..fe03c21ba1af 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3254,10 +3254,12 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
> */
> static void cpuset_fork(struct task_struct *task)
> {
> + struct cpuset * cs;
> if (task_css_is_root(task, cpuset_cgrp_id))
> return;
>
> - set_cpus_allowed_ptr(task, current->cpus_ptr);
> + cs = task_cs(task);
> + set_cpus_allowed_ptr(task, cs->effective_cpus);
> task->mems_allowed = current->mems_allowed;
> }
>
prev parent reply other threads:[~2023-04-11 15:37 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 15:04 When processes are forked using clone3 to a cgroup in cgroup v2 with a specified cpuset.cpus, the cpuset.cpus doesn't take an effect to the new processes Kernel.org Bugbot
[not found] ` <20230411-b217305c0-44d643ccee27-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org>
2023-04-11 15:04 ` Kernel.org Bugbot
2023-04-11 15:37 ` Waiman Long [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=490db90c-6afd-d934-4cd2-2722579f377d@redhat.com \
--to=longman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=bugbot-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=bugs-cunTk1MwBs/YUNznpcFYbw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox