Re: Mask bit support's API

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: "Yang, Sheng" <sheng.yang@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: Mask bit support's API
Date: Tue, 30 Nov 2010 16:15:29 +0200	[thread overview]
Message-ID: <4CF50701.8010903@redhat.com> (raw)
In-Reply-To: <201011261035.09135.sheng.yang@intel.com>

On 11/26/2010 04:35 AM, Yang, Sheng wrote:
> >  >
> >  >  Shouldn't kvm also service reads from the pending bitmask?
> >
> >  Of course KVM should service reading from pending bitmask. For assigned
> >  device, it's kernel who would set the pending bit; but I am not sure for
> >  virtio. This interface is GET_ENTRY, so reading is fine with it.

The kernel should manage it in the same way.  Virtio raises irq (via 
KVM_IRQ_LINE or vhost-net's irqfd), kernel sets pending bit.

Note we need to be able to read and write the pending bitmask for live 
migration.

> >  >
> >  >  We could have the kernel handle addr/data writes by setting up an
> >  >  internal interrupt routing.  A disadvantage is that more work is needed
> >  >  if we emulator interrupt remapping in qemu.
> >
> >  In fact modifying irq routing in the kernel is also the thing I want to
> >  avoid.
> >
> >  So, the flow would be:
> >
> >  kernel get MMIO write, record it in it's own MSI table
> >  KVM exit to QEmu, by one specific exit reason
> >  QEmu know it have to sync the MSI table, then reading the entries from
> >  kernel QEmu found it's an write, so it need to reprogram irq routing table
> >  using the entries above
> >  done
> >
> >  But wait, why should qemu read entries from kernel? By default exit we
> >  already have the information about what's the entry to modify and what to
> >  write, so we can use them directly. By this way, we also don't need an
> >  specific exit reason - just exit to qemu in normal way is fine.

Because we have an interface where you get an exit if (addr % 4) < 3 and 
don't get an exit if (addr % 4) == 3.  There is a gpa range which is 
partially maintained by the kernel and partially in userspace.  It's a 
confusing interface.  Things like 64-bit reads or writes need to be 
broken up and serviced in two different places.

We already need to support this (for unaligned writes which hit two 
regions), but let's at least make a contiguous region behave sanely.

> >
> >  Then it would be:
> >
> >  kernel get MMIO write, record it in it's own MSI table
> >  KVM exit to QEmu, indicate MMIO exit
> >  QEmu found it's an write, it would update it's own MSI table(may need to
> >  query mask bit from kernel), and reprogram irq routing table using the
> >  entries above done
> >
> >  Then why should kernel kept it's own MSI table? I think the only reason is
> >  we can speed up reading in that way - but the reading we want to speed up
> >  is mostly on enabled entry(the first entry), which is already in the IRQ
> >  routing table...

The reason is to keep a sane interface.  Like we emulate instructions 
and msrs in the kernel and don't do half a job.  I don't think there's a 
real need to accelerate the first three words of an msi-x entry.

> >  And for enabled/disabled entry, you can see it like this: for the entries
> >  inside routing table, we think it's enabled; otherwise it's disabled. Then
> >  you don't need to bothered by pci_enable_msix().
> >
> >  So our strategy for reading accelerating can be:
> >
> >  If the entry contained in irq routing table, then use it; otherwise let
> >  qemu deal with it. Because it's the QEmu who owned irq routing table, the
> >  synchronization is guaranteed. We don't need the MSI table in the kernel
> >  then.

I agree about letting qemu manage the irq routing table.  It changes 
very rarely.  I just prefer to let it know about the change via 
something other than KVM_EXIT_MMIO.


> >
> >  And for writing, we just want to cover all of mask bit, but none of others.
> >
> >  I think the concept here is more acceptable?
> >
> >  The issue here is MSI table and irq routing table got duplicate information
> >  on some entries. My initial purposal is to use irq routing table in
> >  kernel, then we don't need to duplicate information.
>
> Avi?

Sorry about the late reply.

> And BTW, we can take routing table as a kind of *cache*, if the content is in the
> cache, then we can fetch it from the cache, otherwise we need to go back to fetch
> it from memory(userspace).

If it's guaranteed by the spec that addr/data pairs are always 
interpreted in the same way, sure.  But there no reason to do it, 
really, it isn't a fast path.

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-11-30 14:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  6:09 Mask bit support's API Yang, Sheng
2010-11-23  6:17 ` Avi Kivity
2010-11-23  6:35   ` Yang, Sheng
2010-11-23  7:54     ` Avi Kivity
2010-11-23  8:30       ` Yang, Sheng
2010-11-23 12:47         ` Avi Kivity
2010-11-23 12:56           ` Michael S. Tsirkin
2010-11-23 13:57           ` Yang, Sheng
2010-11-23 14:06             ` Avi Kivity
2010-11-23 15:11               ` Michael S. Tsirkin
2010-11-23 15:24                 ` Gleb Natapov
2010-11-23 16:10                   ` Michael S. Tsirkin
2010-11-24  1:59               ` Yang, Sheng
2010-11-26  2:35                 ` Yang, Sheng
2010-11-30 14:15                   ` Avi Kivity [this message]
2010-12-01  2:36                     ` Yang, Sheng
2010-12-02 13:09                       ` Avi Kivity
2010-12-02 13:47                         ` Michael S. Tsirkin
2010-12-02 13:56                           ` Avi Kivity
2010-12-02 14:26                             ` Michael S. Tsirkin
2010-12-02 14:54                               ` Sheng Yang
2010-12-02 16:55                                 ` Michael S. Tsirkin
2010-12-03  3:03                                   ` Yang, Sheng
2010-11-23 12:04 ` Michael S. Tsirkin
2010-11-23 14:02   ` Yang, Sheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF50701.8010903@redhat.com \
    --to=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=sheng.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox