public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	mingo@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\
Date: Wed, 27 Sep 2017 21:44:54 -0300	[thread overview]
Message-ID: <20170928004452.GA30040@amt.cnet> (raw)
In-Reply-To: <6f4afefd-8726-13ff-371e-0d3896b4cf6a@redhat.com>

On Wed, Sep 27, 2017 at 11:37:48AM +0200, Paolo Bonzini wrote:
> On 27/09/2017 00:49, Marcelo Tosatti wrote:
> > On Mon, Sep 25, 2017 at 05:12:42PM +0200, Paolo Bonzini wrote:
> >> On 25/09/2017 11:13, Peter Zijlstra wrote:
> >>> On Sun, Sep 24, 2017 at 11:57:53PM -0300, Marcelo Tosatti wrote:
> >>>> I think you are missing the following point:
> >>>>
> >>>> "vcpu0 can be interrupted when its not in a spinlock protected section, 
> >>>> otherwise it can't."
> >>
> >> Who says that?  Certainly a driver can dedicate a single VCPU to
> >> periodic polling of the device, in such a way that the polling does not
> >> require a spinlock.
> > 
> > This sequence:
> > 
> > 
> > VCPU-0					VCPU-1 (running realtime workload)
> > 
> > takes spinlock A
> > scheduled out				
> > 					spinlock(A) (busy spins until
> > 						     VCPU-0 is scheduled
> > 						     back in)
> > scheduled in
> > finishes execution of 
> > code under protected section
> > releases spinlock(A)			
> > 
> > 					takes spinlock(A)
> 
> Yes, but then you have
> 
>                                         busy waits for flag to be set
>   polls device
>   scheduled out
>   device receives data
>                                         ...
>   scheduled in
>   set flag
> 
> or
> 
>   check for work to do
>   scheduled out
>                                         submit work
>                                         busy waits for work to complete
>   scheduled in
>   do work
> 
> None of which have anything to do with spinlocks.  They're just
> different ways to get priority inversion, and this...
> 
> >>>> So this point of "everything needs to be RT and the priorities must be
> >>>> designed carefully", is this: 
> >>>>
> >>>> 	WHEN in spinlock protected section (more specifically, when 
> >>>> 	spinlock protected section _shared with realtime vcpus_),
> >>>>
> >>>> 	priority of vcpu0 > priority of emulator thread
> >>>>
> >>>> 	OTHERWISE
> >>>>
> >>>> 	priority of vcpu0 < priority of emulator thread.
> >>
> >> This is _not_ designed carefully, this is messy.
> > 
> > This is very precise to me. What is "messy" about it? (its clearly
> > defined).
> 
> ... it's not how you design RT systems to avoid priority inversion.
> It's just solving an instance of the issue.
> 
> >> The emulator thread can interrupt the VCPU thread, so it has to be at
> >> higher RT priority (+ priority inheritance of mutexes).  
> > 
> > It can only do that _when_ the VCPU thread is not running a critical
> > section which a higher priority task depends on.
> 
> All VCPUs must have the same priority.  There's no such thing as a
> housekeeping VCPU.
> 
> There could be one sacrificial VCPU that you place on the same physical
> CPU as the emulator thread, but that's it.  The whole system must run
> smoothly even if you place those on the same physical CPU, without any
> PV hacks.
> 
> > So when you say "The emulator thread can interrupt the VCPU thread", 
> > you're saying that it has to be modified to interrupt for a maximum
> > amount of time (say 15us).
> > 
> > Is that what you are suggesting?
> 
> This is correct.  But I think you are missing a fundamental point, that
> is not specific to virt.  If a thread 1 can trigger an event in thread
> 2, and thread 2 runs at priority N, thread 1 must be placed at priority >N.
> 
> In this case, the emulator thread can signal I/O completion to the VCPU:
> it doesn't matter if the VCPU is polling or using interrupts, the
> emulator thread must be placed at higher priority than the VCPU.  This
> is the root cause of the SeaBIOS issue that you were seeing.  Yes, it
> was me who suggested moving VCPU0 and the emulator thread to
> SCHED_NORMAL, but that was just a bandaid until the real fix was done
> (which is to set up SCHED_FIFO priorities correctly and use PI mutexes).
> 
> In fact, this is not specific to the emulator thread.  It applies just
> as well to vhost threads, to QEMU iothreads, even to the KVM PIT
> emulation thread if the guest were using it.
> 
> > Paolo, you don't control how many interruptions of the emulator thread
> > happen per second.
> 
> Indeed these non-VCPU threads should indeed run rarely and for a small
> amount of time only to achieve bounded latencies in the VCPUs.  But if
> they don't, those are simply bugs and we fix them.  In fact, sources of
> frequent interruptions have all been fixed or moved outside the main
> thread; for example, disks can use separate iothreads.  Configuration
> problems are also bugs, just in a different place.


> 
> For a very basic VM that I've just tried (whatever QEMU does by default,
> only tweak was not using the QEMU GUI), there is exactly one
> interruption per second when the VM is idle.  That interruption in turn
> is caused by udev periodically probing the guest CDROM (!).  So that's a
> configuration problem: remove the CDROM, and it does one interruption
> every 10-30 seconds, again all of them guest triggered.
> 
> Again: if you have many interruptions, it's not a flaw in KVM or QEMU's
> design, it's just that someone is doing something stupid.  It could be
> the guest (e.g. unnecessary devices or daemons as in the example above),
> QEMU (e.g. the RTC emulation used to trigger QEMU timers twice a second
> just to increment the clock), or the management (e.g. polling "is the VM
> running" 50 times per second).  But it can and must be fixed.

No, i mean you can run anything in VCPU-0 (it is valid to do that).
And that "anything" can generate 1 interrupt per second, 1000 or 10.000
interrupts per second. Which are all valid things to be done.

"I can't run a kernel compilation on VCPU-0 because that will impact
latency on the realtime VCPU-1" is not acceptable.

> The design should be the basic one: attribute the right priority to each
> thread and things just work.
> 
> Paolo
> 
> > So if you let the emulator thread interrupt the
> > emulator thread at all times, without some kind of bounding 
> > of these interruptions per time unit, you have a similar
> > problem as (*) (where the realtime task is scheduled).
> > 

  reply	other threads:[~2017-09-28  0:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-21 11:38 [patch 0/3] KVM KVM_HC_RT_PRIO hypercall support Marcelo Tosatti
2017-09-21 11:38 ` [patch 1/3] KVM: x86: add per-vcpu option to set guest vcpu -RT priority Marcelo Tosatti
2017-09-21 11:38 ` [patch 2/3] KVM: x86: KVM_HC_RT_PRIO hypercall (host-side) Marcelo Tosatti
2017-09-21 13:32   ` Konrad Rzeszutek Wilk
2017-09-21 13:49     ` Paolo Bonzini
2017-09-22  1:08       ` Marcelo Tosatti
2017-09-22  7:23         ` Paolo Bonzini
2017-09-22 12:24           ` Marcelo Tosatti
2017-09-21 11:38 ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Marcelo Tosatti
2017-09-21 13:36   ` Konrad Rzeszutek Wilk
2017-09-21 14:06     ` Peter Zijlstra
2017-09-22  1:10       ` Marcelo Tosatti
2017-09-22 10:00         ` Peter Zijlstra
2017-09-22 10:56           ` Peter Zijlstra
2017-09-22 12:33             ` Marcelo Tosatti
2017-09-22 12:55               ` Peter Zijlstra
2017-09-23 10:56                 ` Paolo Bonzini
2017-09-23 13:41                   ` Peter Zijlstra
2017-09-24 13:05                     ` Paolo Bonzini
2017-09-25  2:57                       ` Marcelo Tosatti
2017-09-25  9:13                         ` Peter Zijlstra
2017-09-25 15:12                           ` Paolo Bonzini
2017-09-26 22:49                             ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Marcelo Tosatti
2017-09-27  9:37                               ` Paolo Bonzini
2017-09-28  0:44                                 ` Marcelo Tosatti [this message]
2017-09-28  7:22                                   ` Paolo Bonzini
2017-09-28 21:35                                     ` Marcelo Tosatti
2017-09-28 21:41                                       ` Marcelo Tosatti
2017-09-29  8:18                                       ` Paolo Bonzini
2017-09-29 16:40                                         ` Marcelo Tosatti
2017-09-29 17:05                                           ` Paolo Bonzini
2017-09-29 20:17                                             ` Marcelo Tosatti
2017-10-02 12:30                                               ` Paolo Bonzini
2017-10-02 12:48                                                 ` Peter Zijlstra
2017-09-26 23:22                           ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Marcelo Tosatti
2017-09-25 16:20                         ` Konrad Rzeszutek Wilk
2017-09-22 12:16           ` Marcelo Tosatti
2017-09-22 12:31             ` Peter Zijlstra
2017-09-22 12:36               ` Marcelo Tosatti
2017-09-22 12:59                 ` Peter Zijlstra
2017-09-25  1:52                   ` Marcelo Tosatti
2017-09-25  8:35                     ` Peter Zijlstra
2017-09-22 12:40               ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Marcelo Tosatti
2017-09-22 13:01                 ` Peter Zijlstra
2017-09-25  2:22                   ` Marcelo Tosatti
2017-09-25  8:58                     ` Peter Zijlstra
2017-09-25 10:41                     ` Thomas Gleixner
2017-09-25 18:28                       ` Jan Kiszka
2017-09-21 17:45 ` [patch 0/3] KVM KVM_HC_RT_PRIO hypercall support Jan Kiszka
2017-09-22  1:19   ` Marcelo Tosatti
2017-09-22  6:23     ` Jan Kiszka
2017-09-26 23:59       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170928004452.GA30040@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox