kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Longpeng (Mike)" <longpeng2@huawei.com>
To: <peterz@infradead.org>, <pjt@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kvm <kvm@vger.kernel.org>
Cc: Wanpeng Li <kernellwp@gmail.com>,
	Xiexiangyou <xiexiangyou@huawei.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	Gonglei <arei.gonglei@huawei.com>,
	"weiqi (C)" <weiqi4@huawei.com>
Subject: [ RFC ] Set quota on VM cause large schedule latency of vcpu
Date: Tue, 17 Jul 2018 15:05:01 +0800	[thread overview]
Message-ID: <5B4D951D.9050504@huawei.com> (raw)

Virtual machine has cgroup hierarchies as follow:

               root
                |
              vm_tg
              (cfs_rq)
              /    \
            (se)    (se)
            tg_A    tg_B
          (cfs_rq)    (cfs_rq)
            /          \
          (se)          (se)
          a                b

'a' and 'b' are two vcpus of the VM.

We set cfs quota on vm_tg, and the schedule latency of vcpu(a/b) may become very
large, up to more than 2S.
We use perf sched to capture the latency ( perf sched record -a sleep 10;
perf sched lat -p --sort=max ) and the result is as follow:

Task     | Runtime ms | Switches | Average delay ms | Maximum delay ms |
------------------------------------------------------------------------
CPU 0/KVM| 260.261 ms |       50 | avg:   82.017 ms | max: 2510.990 ms |
...

We test the latest kernel and the result is the same.
We add some tracepoints, found the following sequence will cause the issue:

1) 'a' is only task of tg_A, when 'a' go to sleep (e.g. vcpu halt), tg_A is
dequeued, and tg_A->se->load.weight = MIN_SHARES.

2) 'b' continue running, then trigger throttle. tg_A->cfs_rq->throttle_count=1

3) Something wakeup 'a' (e.g. vcpu receive a virq). When enqueue tg_A,
tg_A->se->load.weight can't be updated because tg_A->cfs_rq->throttle_count=1

4) After one cfs quota period, vm_tg is unthrottled

5) 'a' is running

6) After one tick, when update tg_A->se's vruntime, tg_A->se->load.weight is
still MIN_SHARES, lead tg_A->se's vruntime has grown a large value.

7) That will cause 'a' to have a large schedule latency.


We *rudely* remove the check which cause tg_A->se->load.weight didn't reweight
in step-3 as follow and the problem disappear:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be..348ccd6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3016,9 +3016,6 @@ static void update_cfs_group(struct sched_entity *se)
        if (!gcfs_rq)
                return;

-       if (throttled_hierarchy(gcfs_rq))
-               return;
-
#ifndef CONFIG_SMP
        runnable = shares = READ_ONCE(gcfs_rq->tg->shares);


So do guys you have any suggestion on this problem ? Is there a better way fix
this problem ?

-- 
Regards,
Longpeng(Mike)

                 reply	other threads:[~2018-07-17  7:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B4D951D.9050504@huawei.com \
    --to=longpeng2@huawei.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=weidong.huang@huawei.com \
    --cc=weiqi4@huawei.com \
    --cc=xiexiangyou@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).