From: zhengzucheng <zhengzucheng@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>, <mingo@redhat.com>,
<juri.lelli@redhat.com>, <vincent.guittot@linaro.org>,
<rostedt@goodmis.org>, <bsegall@google.com>, <mgorman@suse.de>,
<vschneid@redhat.com>, <vincent.guittot@linaro.org>,
<tjcao980311@gmail.com>, zhengzucheng <zhengzucheng@huawei.com>
Cc: <linux-kernel@vger.kernel.org>
Subject: [Question] sched:the load is unbalanced in the VM overcommitment scenario
Date: Fri, 13 Sep 2024 17:13:21 +0800 [thread overview]
Message-ID: <67060b2c-b2bb-b01e-c24a-6346e9ccc1fb@huawei.com> (raw)
In the VM overcommitment scenario, the overcommitment ratio is 1:2, 8
CPUs are overcommitted to 2 x 8u VMs,
and 16 vCPUs are bound to 8 cpu. However, one VM obtains only 2 CPUs
resources, the other VM has 6 CPUs.
The host is configured with 80 CPUs in a sched domain and other CPUs are
in the idle state.
The root cause is that the load of the host is unbalanced, some vCPUs
exclusively occupy CPU resources.
when the CPU that triggers load balance calculates imbalance value,
env->imbalance = 0 is calculated because of
local->avg_load > sds->avg_load. As a result, the load balance fails.
(https://github.com/torvalds/linux/commit/91dcf1e8068e9a8823e419a7a34ff4341275fb70)
It's normal from kernel load balance, but it's not reasonable from the
perspective of VM users.
In cgroup v1, set cpuset.sched_load_balance=0 to modify the schedule
domain to fix it.
Is there any other method to fix this problem? thanks!
Abstracted reproduction case:
1.environment information:
[root@localhost ~]# cat /proc/schedstat
cpu0
domain0 00000000,00000000,00010000,00000000,00000001
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu1
domain0 00000000,00000000,00020000,00000000,00000002
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu2
domain0 00000000,00000000,00040000,00000000,00000004
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
cpu3
domain0 00000000,00000000,00080000,00000000,00000008
domain1 00000000,00ffffff,ffff0000,000000ff,ffffffff
domain2 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
2.test case:
vcpu.c
#include <stdio.h>
#include <unistd.h>
int main()
{
sleep(20);
while (1);
return 0;
}
gcc vcpu.c -o vcpu
-----------------------------------------------------------------
test.sh
#!/bin/bash
#vcpu1
mkdir /sys/fs/cgroup/cpuset/vcpu_1
echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/vcpu_1/cpuset.mems
for i in {1..8}
do
./vcpu &
pid=$!
sleep 1
echo $pid > /sys/fs/cgroup/cpuset/vcpu_1/tasks
done
#vcpu2
mkdir /sys/fs/cgroup/cpuset/vcpu_2
echo '0-3, 80-83' > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/vcpu_2/cpuset.mems
for i in {1..8}
do
./vcpu &
pid=$!
sleep 1
echo $pid > /sys/fs/cgroup/cpuset/vcpu_2/tasks
done
------------------------------------------------------------------
[root@localhost ~]# ./test.sh
[root@localhost ~]# top -d 1 -c -p $(pgrep -d',' -f vcpu)
14591 root 20 0 2448 1012 928 R 100.0 0.0 13:10.73 ./vcpu
14582 root 20 0 2448 1012 928 R 100.0 0.0 13:12.71 ./vcpu
14606 root 20 0 2448 872 784 R 100.0 0.0 13:09.72 ./vcpu
14620 root 20 0 2448 916 832 R 100.0 0.0 13:07.72 ./vcpu
14622 root 20 0 2448 920 836 R 100.0 0.0 13:06.72 ./vcpu
14629 root 20 0 2448 920 832 R 100.0 0.0 13:05.72 ./vcpu
14643 root 20 0 2448 924 836 R 21.0 0.0 2:37.13 ./vcpu
14645 root 20 0 2448 868 784 R 21.0 0.0 2:36.51 ./vcpu
14589 root 20 0 2448 900 816 R 20.0 0.0 2:45.16 ./vcpu
14608 root 20 0 2448 956 872 R 20.0 0.0 2:42.24 ./vcpu
14632 root 20 0 2448 872 788 R 20.0 0.0 2:38.08 ./vcpu
14638 root 20 0 2448 924 840 R 20.0 0.0 2:37.48 ./vcpu
14652 root 20 0 2448 928 844 R 20.0 0.0 2:36.42 ./vcpu
14654 root 20 0 2448 924 840 R 20.0 0.0 2:36.14 ./vcpu
14663 root 20 0 2448 900 816 R 20.0 0.0 2:35.38 ./vcpu
14669 root 20 0 2448 868 784 R 20.0 0.0 2:35.70 ./vcpu
next reply other threads:[~2024-09-13 9:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-13 9:13 zhengzucheng [this message]
-- strict thread matches above, loose matches on Subject: below --
2024-07-25 12:03 [PATCH -next] sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime Zheng Zucheng
2024-09-02 1:56 ` [Question] Include isolated cpu to ensure that tasks are not scheduled to isolated cpu? zhengzucheng
2024-09-02 3:00 ` Waiman Long
2024-09-13 4:03 ` [Question] sched:the load is unbalanced in the VM overcommitment scenario zhengzucheng
2024-09-13 15:55 ` Vincent Guittot
2024-09-14 7:03 ` zhengzucheng
2024-09-17 6:19 ` Vincent Guittot
2024-09-13 17:17 ` Waiman Long
2024-09-14 2:15 ` zhengzucheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67060b2c-b2bb-b01e-c24a-6346e9ccc1fb@huawei.com \
--to=zhengzucheng@huawei.com \
--cc=bsegall@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tjcao980311@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox