qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: qemu-devel <qemu-devel@nongnu.org>, KVM list <kvm@vger.kernel.org>
Subject: [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace
Date: Mon, 07 Jun 2010 21:42:08 +0300	[thread overview]
Message-ID: <4C0D3D80.5060208@redhat.com> (raw)
In-Reply-To: <4C0D26B5.9030708@codemonkey.ws>

On 06/07/2010 08:04 PM, Anthony Liguori wrote:
>
> I think we could also move the local APIC.

I'm not even sure we can safely move the ioapic/pic (mostly due to 
churn).  But the local APIC is so heavily accessed by the guest that 
it's impossible to move it.  Run an ftrace one day, especially on an smp 
guest.  Every IPI requires several APIC accesses.  Before a halt a 
tickless kernel sets the wakeup timer.  EOIs.

>
> To optimize device models, we've tended to put the full device model 
> in the kernel whereas the hardware vendors have tended to put only the 
> fast paths of the devices models in hardware.
>
> For instance, we could introduce a userspace interface similar to 
> vapic support whereas a shared page that mapped the APIC's layout was 
> used with a mask to select which registers trapped on read/write.

That leads to very problematic interfaces.  When you separate along a 
device boundary, you have a spec that defines the software interfaces.  
When you separate along a boundary that you define, it's up to you to 
get everything right.

In fact with the ioapic/pic/lapic one of the problems is that the 
interconnection between the devices that is not well defined, and that's 
where we have bugs.

>
> That said, I can understand an argument that the local APIC is part of 
> the CPU state since it's a very special type of device.
>
> A better example would be a generic counter kernel mechanism.  I can 
> envision such a device as doing nothing more than providing a 
> read-only view of a counter with a userspace configurable divider and 
> width.  Any write to the counter or read of any other byte outside the 
> counter register would result in a trap to userspace.

What about latches?  byte access to word registers?  There will be as 
many special cases as there are timers.

If the kernel supported a bytecode/jit facility I'd happily use that to 
download portions of the device model into the kernel.

>
> That should allow both the PIT and the HPET to be accelerated with 
> minimal effort in the kernel.

IMO it's probably more effort than porting HPET to the kernel.  Try 
outlining an interface that supports PIT, HPET, RTC, and ACPI PMTIMER.

>
> I'd be in favor of a straight port to userspace.  We already have the 
> interfaces to communicate with an external device model for these 
> devices so let's just take the kernel code and stick it into dedicated 
> threads in userspace.

Currently we support an all-or-nothing approach.  I don't think local 
APIC in userspace is worthwhile.  Esp. as it will slow down vhost and 
assigned devices significantly - interrupts will have to be mediated by 
userspace.

>
> I think it's easier to then work to merge the two bits of code in the 
> same tree than it is to try and take out-of-tree code and merge it 
> incrementally.

Are you talking about qemu.git/qemu-kvm.git?  That's the least of my 
concerns, I'm worried about kvm.git.

>
>> 5. Risk
>>
>> We may find out after all this is implemented that performance is not 
>> acceptable and all the work will have to be dropped.
>
> That's another advantage to a straight port to userspace.  We can 
> collect performance data with only a modest amount of engineering effort.

Port what exactly?  We have a userspace irqchip implementation.  What we 
don't have is just the ioapic/pic/pit in userspace, and the only way to 
try it out is to implement the whole thing.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

  reply	other threads:[~2010-06-07 18:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-07 15:26 [Qemu-devel] [RFC] Moving the kvm ioapic, pic, and pit back to userspace Avi Kivity
2010-06-07 16:31 ` [Qemu-devel] " David S. Ahern
2010-06-07 18:46   ` Avi Kivity
2010-06-07 18:54     ` David S. Ahern
2010-06-07 19:16       ` Avi Kivity
2010-06-07 17:04 ` Anthony Liguori
2010-06-07 18:42   ` Avi Kivity [this message]
2010-06-07 22:23     ` Anthony Liguori
2010-06-08  5:48       ` Avi Kivity
2010-06-09 15:59 ` [Qemu-devel] " Dong, Eddie
2010-06-09 16:05   ` [Qemu-devel] " Avi Kivity
2010-06-10  2:37     ` [Qemu-devel] " Dong, Eddie
2010-06-10  2:59       ` [Qemu-devel] " Avi Kivity
2010-06-10 14:42         ` [Qemu-devel] " Dong, Eddie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C0D3D80.5060208@redhat.com \
    --to=avi@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).