qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Rob Earhart <earhart@google.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [RFC] Next gen kvm api
Date: Sun, 05 Feb 2012 15:14:15 +0200	[thread overview]
Message-ID: <4F2E80A7.5040908@redhat.com> (raw)
In-Reply-To: <CAB9FdM9M2DWXBxxyG-ez_5igT61x5b7ptw+fKfgaqMBU_JS5aA@mail.gmail.com>

On 02/03/2012 12:13 AM, Rob Earhart wrote:
> On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivity <avi@redhat.com
> <mailto:avi@redhat.com>> wrote:
>
>     The kvm api has been accumulating cruft for several years now.
>      This is
>     due to feature creep, fixing mistakes, experience gained by the
>     maintainers and developers on how to do things, ports to new
>     architectures, and simply as a side effect of a code base that is
>     developed slowly and incrementally.
>
>     While I don't think we can justify a complete revamp of the API
>     now, I'm
>     writing this as a thought experiment to see where a from-scratch
>     API can
>     take us.  Of course, if we do implement this, the new and old APIs
>     will
>     have to be supported side by side for several years.
>
>     Syscalls
>     --------
>     kvm currently uses the much-loved ioctl() system call as its entry
>     point.  While this made it easy to add kvm to the kernel
>     unintrusively,
>     it does have downsides:
>
>     - overhead in the entry path, for the ioctl dispatch path and vcpu
>     mutex
>     (low but measurable)
>     - semantic mismatch: kvm really wants a vcpu to be tied to a
>     thread, and
>     a vm to be tied to an mm_struct, but the current API ties them to file
>     descriptors, which can move between threads and processes.  We check
>     that they don't, but we don't want to.
>
>     Moving to syscalls avoids these problems, but introduces new ones:
>
>     - adding new syscalls is generally frowned upon, and kvm will need
>     several
>     - syscalls into modules are harder and rarer than into core kernel
>     code
>     - will need to add a vcpu pointer to task_struct, and a kvm pointer to
>     mm_struct
>
>     Syscalls that operate on the entire guest will pick it up implicitly
>     from the mm_struct, and syscalls that operate on a vcpu will pick
>     it up
>     from current.
>
>
> <snipped>
>
> I like the ioctl() interface.  If the overhead matters in your hot path,

I can't say that it's a pressing problem, but it's not negligible.

> I suspect you're doing it wrong;

What am I doing wrong?

> use irq fds & ioevent fds.  You might fix the semantic mismatch by
> having a notion of a "current process's VM" and "current thread's
> VCPU", and just use the one /dev/kvm filedescriptor.
>
> Or you could go the other way, and break the connection between VMs
> and processes / VCPUs and threads: I don't know how easy it is to do
> it in Linux, but a VCPU might be backed by a kernel thread, operated
> on via ioctl()s, indicating that they've exited the guest by having
> their descriptors become readable (and either use read() or mmap() to
> pull off the reason why the VCPU exited). 

That breaks the ability to renice vcpu threads (unless you want the user
renice kernel threads).

> This would allow for a variety of different programming styles for the
> VMM--I'm a fan of CSP model myself, but that's hard to do with the
> current API.

Just convert the synchronous API to an RPC over a pipe, in the vcpu
thread, and you have the asynchronous model you asked for.

>
> It'd be nice to be able to kick a VCPU out of the guest without
> messing around with signals.  One possibility would be to tie it to an
> eventfd;

We have to support signals in any case, supporting more mechanisms just
increases complexity.

> another might be to add a pseudo-register to indicate whether the VCPU
> is explicitly suspended.  (Combined with the decoupling idea, you'd
> want another pseudo-register to indicate whether the VMM is implicitly
> suspended due to an intercept; a single "runnable" bit is racy if both
> the VMM and VCPU are setting it.)
>
> ioevent fds are definitely useful.  It might be cute if they could
> synchronously set the VIRTIO_USED_F_NOTIFY bit - the guest could do
> this itself, but that'd require giving the guest write access to the
> used side of the virtio queue, and I kind of like the idea that it
> doesn't need write access there.  Then again, I don't have any perf
> data to back up the need for this.
>

I'd hate to tie ioeventfds into virtio specifics, they're a general
mechanism.  Especially if the guest can do it itself.

-- 
error compiling committee.c: too many arguments to function

  parent reply	other threads:[~2012-02-05 13:14 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-02 16:09 [Qemu-devel] [RFC] Next gen kvm api Avi Kivity
2012-02-02 22:13 ` Rob Earhart
2012-02-02 22:16   ` Rob Earhart
2012-02-05 13:14   ` Avi Kivity [this message]
2012-02-06 17:41     ` Rob Earhart
2012-02-06 19:11       ` Anthony Liguori
2012-02-07 12:03         ` Avi Kivity
2012-02-07 15:17           ` Anthony Liguori
2012-02-07 16:02             ` Avi Kivity
2012-02-07 16:18               ` Jan Kiszka
2012-02-07 16:21                 ` Anthony Liguori
2012-02-07 16:29                   ` Jan Kiszka
2012-02-15 13:41                     ` Avi Kivity
2012-02-07 16:19               ` Anthony Liguori
2012-02-15 13:47                 ` Avi Kivity
2012-02-07 12:01       ` Avi Kivity
2012-02-03  2:09 ` Anthony Liguori
2012-02-04  2:08   ` Takuya Yoshikawa
2012-02-22 13:06     ` Peter Zijlstra
2012-02-05  9:24   ` Avi Kivity
2012-02-07  1:08   ` Alexander Graf
2012-02-07 12:24     ` Avi Kivity
2012-02-07 12:51       ` Alexander Graf
2012-02-07 13:16         ` Avi Kivity
2012-02-07 13:40           ` Alexander Graf
2012-02-07 14:21             ` Avi Kivity
2012-02-07 14:39               ` Alexander Graf
2012-02-15 11:18                 ` Avi Kivity
2012-02-15 11:57                   ` Alexander Graf
2012-02-15 13:29                     ` Avi Kivity
2012-02-15 13:37                       ` Alexander Graf
2012-02-15 13:57                         ` Avi Kivity
2012-02-15 14:08                           ` Alexander Graf
2012-02-16 19:24                             ` Avi Kivity
2012-02-16 19:34                               ` Alexander Graf
2012-02-16 19:38                                 ` Avi Kivity
2012-02-16 20:41                                   ` Scott Wood
2012-02-17  0:23                                     ` Alexander Graf
2012-02-17 18:27                                       ` Scott Wood
2012-02-18  9:49                                     ` Avi Kivity
2012-02-17  0:19                                   ` Alexander Graf
2012-02-18 10:00                                     ` Avi Kivity
2012-02-18 10:43                                       ` Alexander Graf
2012-02-15 19:17                     ` Scott Wood
2012-02-12  7:10               ` Takuya Yoshikawa
2012-02-15 13:32                 ` Avi Kivity
2012-02-07 15:23             ` Anthony Liguori
2012-02-07 15:28               ` Alexander Graf
2012-02-08 17:20               ` Alan Cox
2012-02-15 13:33               ` Avi Kivity
2012-02-15 22:14             ` Arnd Bergmann
2012-02-10  3:07   ` Jamie Lokier
2012-02-03 18:07 ` Eric Northup
2012-02-03 22:52   ` Anthony Liguori
2012-02-06 19:46     ` Scott Wood
2012-02-07  6:58       ` Michael Ellerman
2012-02-07 10:04         ` Alexander Graf
2012-02-15 22:21           ` Arnd Bergmann
2012-02-16  1:04             ` Michael Ellerman
2012-02-16 19:28               ` Avi Kivity
2012-02-17  0:09                 ` Michael Ellerman
2012-02-18 10:03                   ` Avi Kivity
2012-02-16 10:26             ` Avi Kivity
2012-02-07 12:28       ` Anthony Liguori
2012-02-07 12:40         ` Avi Kivity
2012-02-07 12:51           ` Anthony Liguori
2012-02-07 13:18             ` Avi Kivity
2012-02-07 15:15               ` Anthony Liguori
2012-02-07 18:28                 ` Chris Wright
2012-02-08 17:02         ` Scott Wood
2012-02-08 17:12           ` Alan Cox
2012-02-05  9:37 ` Gleb Natapov
2012-02-05  9:44   ` Avi Kivity
2012-02-05  9:51     ` Gleb Natapov
2012-02-05  9:56       ` Avi Kivity
2012-02-05 10:58         ` Gleb Natapov
2012-02-05 13:16           ` Avi Kivity
2012-02-05 16:36       ` Anthony Liguori
2012-02-06  9:34         ` Avi Kivity
2012-02-06 13:33           ` Anthony Liguori
2012-02-06 13:54             ` Avi Kivity
2012-02-06 14:00               ` Anthony Liguori
2012-02-06 14:08                 ` Avi Kivity
2012-02-07 18:12           ` Rusty Russell
2012-02-15 13:39             ` Avi Kivity
2012-02-15 21:59               ` Anthony Liguori
2012-02-16  8:57                 ` Gleb Natapov
2012-02-16 14:46                   ` Anthony Liguori
2012-02-16 19:34                     ` Avi Kivity
2012-02-15 23:08               ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F2E80A7.5040908@redhat.com \
    --to=avi@redhat.com \
    --cc=earhart@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).