From: Sean Christopherson <seanjc@google.com>
To: Haiwei Li <lihaiwei.kernel@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Haiwei Li <lihaiwei@tencent.com>
Subject: Re: [PATCH] kvm: lapic: add module parameters for LAPIC_TIMER_ADVANCE_ADJUST_MAX/MIN
Date: Fri, 12 Mar 2021 16:58:27 -0800 [thread overview]
Message-ID: <YEwOM3aTeUjVim/i@google.com> (raw)
In-Reply-To: <CAB5KdOZkdXsLup+58On=LZ6eG4jYdcaK2NCt9U0Q-qy_6dQrfw@mail.gmail.com>
On Wed, Mar 10, 2021, Haiwei Li wrote:
> On Wed, Mar 10, 2021 at 7:42 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Mar 03, 2021, Haiwei Li wrote:
> > > On 21/3/3 10:09, lihaiwei.kernel@gmail.com wrote:
> > > > From: Haiwei Li <lihaiwei@tencent.com>
> > > >
> > > > In my test environment, advance_expire_delta is frequently greater than
> > > > the fixed LAPIC_TIMER_ADVANCE_ADJUST_MAX. And this will hinder the
> > > > adjustment.
> > >
> > > Supplementary details:
> > >
> > > I have tried to backport timer related features to our production
> > > kernel.
> > >
> > > After completed, i found that advance_expire_delta is frequently greater
> > > than the fixed value. It's necessary to trun the fixed to dynamically
> > > values.
> >
> > Does this reproduce on an upstream kernel? If so...
> >
> > 1. How much over the 10k cycle limit is the delta?
> > 2. Any idea what causes the large delta? E.g. is there something that can
> > and/or should be fixed elsewhere?
> > 3. Is it platform/CPU specific?
>
> Hi, Sean
>
> I have traced the flow on our production kernel and it frequently consumes more
> than 10K cycles from sched_out to sched_in.
> So two scenarios tested on Cascade lake Server(96 pcpu), v5.11 kernel.
>
> 1. only cyclictest in guest(88 vcpu and bound with isolated pcpus, w/o mwait
> exposed, adaptive advance lapic timer is default -1). The ratio of occurrences:
>
> greater_than_10k/total: 29/2060, 1.41%
>
> 2. cyclictest in guest(88 vcpu and not bound, w/o mwait exposed, adaptive
> advance lapic timer is default -1) and stress in host(no isolate). The ratio of
> occurrences:
>
> greater_than_10k/total: 122381/1017363, 12.03%
Hmm, I'm inclined to say this is working as intended. If the vCPU isn't affined
and/or it's getting preempted, then large spikes are expected, and not adjusting
in reaction to those spikes is desirable. E.g. adjusting by 20k cycles because
the timer happened to expire while a vCPU was preempted will cause KVM to busy
wait for quite a long time if the next timer runs without interference, and then
KVM will thrash the advancement.
And I don't really see the point in pushing the max adjustment beyond 10k. The
max _advancement_ is 5000ns, which means that even with a blazing fast 5.0ghz
system, a max adjustment of 1250 (10k/ 8, the step divisor) should get KVM to
the 25000 cycle advancement limit relatively quickly. Since KVM resets to the
initial 1000ns advancement when it would exceed the 5000ns max, I suspect that
raising the max adjustment much beyond 10k cycles would quickly push a vCPU to
the max, cause it to reset, and rinse and repeat.
Note, we definitely don't want to raise the 5000ns max, as waiting with IRQs
disabled for any longer than that will likely cause system instability.
next prev parent reply other threads:[~2021-03-13 0:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-03 2:09 [PATCH] kvm: lapic: add module parameters for LAPIC_TIMER_ADVANCE_ADJUST_MAX/MIN lihaiwei.kernel
2021-03-03 2:39 ` Haiwei Li
2021-03-09 23:42 ` Sean Christopherson
2021-03-10 9:15 ` Haiwei Li
2021-03-13 0:58 ` Sean Christopherson [this message]
2021-03-13 1:31 ` Haiwei Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YEwOM3aTeUjVim/i@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=lihaiwei.kernel@gmail.com \
--cc=lihaiwei@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox