From: "Kernel.org Bugbot" <bugbot@kernel.org>
To: tj@kernel.org, bugs@lists.linux.dev, cgroups@vger.kernel.org,
hannes@cmpxchg.org, lizefan.x@bytedance.com
Subject: When processes are forked using clone3 to a cgroup in cgroup v2 with a specified cpuset.cpus, the cpuset.cpus doesn't take an effect to the new processes
Date: Tue, 11 Apr 2023 15:04:53 +0000 (UTC) [thread overview]
Message-ID: <20230411-b217305c0-44d643ccee27@bugzilla.kernel.org> (raw)
tcao34 writes via Kernel.org Bugzilla:
When using Linux Kernel 6.0 or 6.3-rc5, we found an issue related to clone3 and cpuset subsystem of cgroup v2. When I'm trying to use clone3 with flags "CLONE_INTO_CGROUP" to clone a process into a cgroup, the cpuset.cpus of the cgroup doesn't take an effect to the new processes.
Reproduce
==============
1) I'm using kernel 6.0 and kernel 6.3-rc5. When booting the kernel, I add the command "cgroup_no_v1=all" to disable cgroup v1.
2) We create a cgroup named 't0' and set cpuset.cpus as the first cpu:
echo '+cpuset' > /sys/fs/cgroup/cgroup.subtree_control
mkdir /sys/fs/cgroup/t0
echo 0 > /sys/fs/cgroup/t0/cpuset.cpus
2) we run the belowing c program, in which we use clone3 system call to clone 9 processes into cgroup 't0':
#define _GNU_SOURCE
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */
#define __aligned_u64 uint64_t __attribute__((aligned(8)))
int dirfd_open_opath(const char *dir)
{
return open(dir, O_RDONLY | O_PATH);
}
struct __clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
__aligned_u64 cgroup;
};
pid_t clone_into_cgroup(int cgroup_fd)
{
pid_t pid;
struct __clone_args args = {
.flags = CLONE_INTO_CGROUP,
.exit_signal = SIGCHLD,
.cgroup = cgroup_fd,
};
pid = syscall(SYS_clone3, &args, sizeof(struct __clone_args));
if (pid < 0)
return -1;
return pid;
}
int main(int argc, char *argv[]) {
int i, n = 9;
int status = 0;
pid_t pids[9];
pid_t wpid;
char cgname[100] = "/sys/fs/cgroup/t0";
int cgroup_fd;
for (i = 0; i < n; ++i) {
cgroup_fd = dirfd_open_opath(cgname);
pids[i] = clone_into_cgroup(cgroup_fd);
close(cgroup_fd);
if (pids[i] < 0) {
perror("fork");
abort();
} else if (pids[i] == 0) {
printf("fork successfully %d\n", getppid());
while(1);
}
}
while ((wpid = wait(&status)) > 0);
}
3) Use 'ps' command, we get the pids of the new forked processes are: 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824
4) When we call "cat /sys/fs/cgroup/t0/cgroup.procs", the results show that all new forked processes are attached to the cgroup 't0':
root@node0:/sys/fs/cgroup/t0# cat /sys/fs/cgroup/t0/cgroup.procs
1816
1817
1818
1819
1820
1821
1822
1823
1824
5) However, when we use taskset to check the cpu affinity, all new forked processes are allowed to use all available cpus.
root@node0:/sys/fs/cgroup/t0# taskset -p 1816
pid 1816's current affinity mask: ffffffffff
6) Also, if we check by 'top', each task is using 100% cpu time, rather than 9 tasks share the first cpu.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1816 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1817 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1818 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1819 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1820 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1821 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1822 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1823 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
1824 root 20 0 2496 960 960 R 100.0 0.0 4:04.08 test
root cause
==============
In $Linux_DIR/kernel/cgroup/cpuset.c, function cpuset_fork works as:
static void cpuset_fork(struct task_struct *task)
{
if (task_css_is_root(task, cpuset_cgrp_id))
return;
set_cpus_allowed_ptr(task, current->cpus_ptr);
task->mems_allowed = current->mems_allowed;
}
It directly set the allowed cpus of the new forked task as the cpus_ptr of current task (aka parent task). However, if we use clone3() to clone a task to a different cgroup, a task still inherits the parent's allowed_cpus rather than the allowed_cpus of the cgroup clone3() specified.
Fix
==============
We add a patch to the commit 148341f0a2f53b5e8808d093333d85170586a15d and it can fix the issue in this senarior.
---
kernel/cgroup/cpuset.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 636f1c682ac0..fe03c21ba1af 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3254,10 +3254,12 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
*/
static void cpuset_fork(struct task_struct *task)
{
+ struct cpuset * cs;
if (task_css_is_root(task, cpuset_cgrp_id))
return;
- set_cpus_allowed_ptr(task, current->cpus_ptr);
+ cs = task_cs(task);
+ set_cpus_allowed_ptr(task, cs->effective_cpus);
task->mems_allowed = current->mems_allowed;
}
--
Info
==============
Host OS: ubuntu20.04
Processor: Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz
Kernel Version: 6.3-rc5, 6.0
View: https://bugzilla.kernel.org/show_bug.cgi?id=217305#c0
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)
next reply other threads:[~2023-04-11 15:04 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 15:04 Kernel.org Bugbot [this message]
2023-04-11 15:04 ` When processes are forked using clone3 to a cgroup in cgroup v2 with a specified cpuset.cpus, the cpuset.cpus doesn't take an effect to the new processes Kernel.org Bugbot
2023-04-11 15:37 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230411-b217305c0-44d643ccee27@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=bugs@lists.linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=lizefan.x@bytedance.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).