Re: [Qemu-devel] [PATCH] Add realtime option

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@web.de>
To: Satoru Moriya <satoru.moriya@hds.com>
Cc: "dle-develop@lists.sourceforge.net"
	<dle-develop@lists.sourceforge.net>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Seiji Aguchi <seiji.aguchi@hds.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] Add realtime option
Date: Sat, 03 Nov 2012 08:45:44 +0100	[thread overview]
Message-ID: <5094CBA8.9090902@web.de> (raw)
In-Reply-To: <8631DC5930FA9E468F04F3FD3A5D007213990E4C@USINDEM103.corp.hds.com>

[-- Attachment #1: Type: text/plain, Size: 3840 bytes --]

On 2012-11-03 05:43, Satoru Moriya wrote:
> We have some plans to migrate old enterprise/control systems which
> require low latency (msec order) to kvm virtualized environment.
> In order to satisfy the requirements, this patch adds realtime option
> to qemu:
> 
>  -realtime maxprio=<prio>,policy=<pol>
> 
> This option change the scheduling policy and priority to realtime one
> (only vcpu thread) as specified with argument and mlock all qemu and
> guest memory.

This patch breaks win32 build. All the POSIX stuff has to be pushed into
os-posix.c e.g. I'm introducing some os_prioritize() function for that
purpose, empty on win32.

Then another question is how to get the parameters around. I played with
many options, ending up so far with

/* called by os_prioritize */
void qemu_init_realtime(int rt_sched_policy, int max_sched_priority);
/* called by threaded subsystems */
bool qemu_realtime_is_enabled(void);
void qemu_realtime_get_parameters(int *policy, int *max_priority);

all hosted by qemu-thread-*.c (empty/aborting on win32). This allows to
adjust subsystems to realtime without pushing all the parameters into
global variables.

> 
> Of course, we need much more improvements to keep latency low in qemu
> virtualized environment and this is a first step. OTOH, we can meet the
> requirement of our first migration project with this patch.
> 
> These are basic performance test results:
> 
> Host : 4 core, 4GB, 3.7.0-rc3
> Guest: 1 core, 512MB, 3.6.3-1.fc17
> 
> Benchmark: cyclictest
> https://rt.wiki.kernel.org/index.php/Cyclictest
> 
> Command:
>  $ cyclictest -p 99 -n -m -q -l 100000
> 
> Results:
>  - no load (1:normal qemu, 2:realtime qemu)
>    1. T: 0 ( 544) P:99 I:1000 C:100000 Min: 11 Act: 32 Avg: 157 Max: 10029
>    2. T: 0 ( 449) P:99 I:1000 C:100000 Min: 16 Act: 30 Avg:  29 Max:   540
> 
>  - load (heavy network traffic) (3:normal qemu, 4: realtime qemu)
>    3. T: 0 (3455) P:99 I:1000 C:100000 Min: 10 Act: 38 Avg: 364 Max: 18394
>    4. T: 0 ( 493) P:99 I:1000 C:100000 Min: 12 Act: 21 Avg:  76 Max: 10796

What are the numbers of "chrt -f -p 99 <vcpu_tid>" compared to this?

My point is: This alone is not yet a good justification for the switch
and its current semantic. The approach of just raising the VCPU priority
is quite fragile without [V]CPU isolation. If you raise the VCPU over
its event threads, specifically the iothread, you risk starvation, e.g
during boot (BIOS will poll endlessly for PIT or disk). Yes, there is
/proc/sys/kernel/sched_rt_*, but this is what you typically disable when
doing realtime seriously, particularly if your guest doesn't idle during
operation.

The model I would propose for mainline first is different: maxprio goes
to the event threads, maxprio - 1 to all vcpus (means that maxprio must
be > 1). This setup is less likely to starve and makes more sense
(interrupts must have higher prio than CPUs).

However, that's also not yet generic as we will have scenarios where
only part of the event sources and VCPUs will be prioritized and the
rest shall remain low prio / SCHED_OTHER. Besides defining a way to
express such configurations, the problem is that they may not work
during guest boot. So some realtime profile switching concept may also
be needed. I haven't made up my mind on these issues yet. Not to speak
of the horrible mess of configuring a PREEMPT-RT host...

What is clear, though, is that we need a reference show case for
realtime QEMU/KVM. One that is as easy to reproduce as possible, doesn't
depend on proprietary realtime guests and clearly shows the advantages
of all the needed changes for a reasonable use case. I'd like to discuss
this at the RT-KVM BoF at the KVM Forum next week. Will you and/or any
of your colleagues be there?

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

next prev parent reply	other threads:[~2012-11-03  7:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-03  4:43 [Qemu-devel] [PATCH] Add realtime option Satoru Moriya
2012-11-03  7:45 ` Jan Kiszka [this message]
2012-11-05 23:49   ` Satoru Moriya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5094CBA8.9090902@web.de \
    --to=jan.kiszka@web.de \
    --cc=aliguori@us.ibm.com \
    --cc=dle-develop@lists.sourceforge.net \
    --cc=qemu-devel@nongnu.org \
    --cc=satoru.moriya@hds.com \
    --cc=seiji.aguchi@hds.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).