Re: Mask bit support's API - Michael S. Tsirkin

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: "Yang, Sheng" <sheng.yang@intel.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: Mask bit support's API
Date: Tue, 23 Nov 2010 14:56:16 +0200	[thread overview]
Message-ID: <20101123125616.GD26313@redhat.com> (raw)
In-Reply-To: <4CEBB7E5.5030601@redhat.com>

On Tue, Nov 23, 2010 at 02:47:33PM +0200, Avi Kivity wrote:
> On 11/23/2010 10:30 AM, Yang, Sheng wrote:
> >On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
> >>  On 11/23/2010 08:35 AM, Yang, Sheng wrote:
> >>  >  On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
> >>  >  >   On 11/23/2010 08:09 AM, Yang, Sheng wrote:
> >>  >  >   >   Hi Avi,
> >>  >  >   >
> >>  >  >   >   I've purposed the following API for mask bit support.
> >>  >  >   >
> >>  >  >   >   The main point is, QEmu can know which entries are enabled(by
> >>  >  >   >   pci_enable_msix()). And for enabled entries, kernel own it,
> >>  >  >   >   including MSI data/address and mask bit(routing table and mask
> >>  >  >   >   bitmap). QEmu should use KVM_GET_MSIX_ENTRY ioctl to get them(and
> >>  >  >   >   it can sync with them if it want to do so).
> >>  >  >   >
> >>  >  >   >   Before entries are enabled, QEmu can still use it's own MSI
> >>  >  >   >   table(because we didn't contain these kind of information in
> >>  >  >   >   kernel, and it's unnecessary for kernel).
> >>  >  >   >
> >>  >  >   >   The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to query
> >>  >  >   >   one entry didn't exist in kernel - or we can simply return -EINVAL
> >>  >  >   >   for it.
> >>  >  >   >
> >>  >  >   >   I suppose it would be rare for QEmu to use this interface to get
> >>  >  >   >   the context of entry(the only case I think is when MSI-X disable
> >>  >  >   >   and QEmu need to sync the context), so performance should not be
> >>  >  >   >   an issue.
> >>  >  >   >
> >>  >  >   >   What's your opinion?
> >>  >  >   >
> >>  >  >   >   >    #define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d, struct
> >>  >  >   >   >    kvm_msix_entry)
> >>  >  >
> >>  >  >   Need SET_MSIX_ENTRY for live migration as well.
> >>  >
> >>  >  Current we don't support LM with VT-d...
> >>
> >>  Isn't this work useful for virtio as well?
> >
> >Yeah, but won't be included in this patchset.
> 
> What API changes are needed?  I'd like to see the complete API.
> 
> >>  >  >   What about the pending bits?
> >>  >
> >>  >  We didn't cover it here - and it's in another MMIO space(PBA). Of course
> >>  >  we can add more flags for it later.
> >>
> >>  When an entry is masked, we need to set the pending bit for it
> >>  somewhere.  I guess this is broken in the existing code (without your
> >>  patches)?
> >
> >Even with my patch, we didn't support the pending bit. It would always return 0
> >now. What we supposed to do(after my patch checked in) is to check IRQ_PENDING flag
> >of irq_desc->status(if the entry is masked), and return the result to userspace.
> >
> >That would involve some core change, like to export irq_to_desc(). I don't think
> >it would be accepted soon, so would push mask bit first.
> 
> The API needs to be compatible with the pending bit, even if we
> don't implement it now.  I want to reduce the rate of API changes.
> 
> >>
> >>  >  >   Also need a new exit reason to tell userspace that an msix entry has
> >>  >  >   changed, so userspace can update mappings.
> >>  >
> >>  >  I think we don't need it. Whenever userspace want to get one mapping
> >>  >  which is an enabled MSI-X entry, it can check it with the API
> >>  >  above(which is quite rare, because kernel would handle all of them when
> >>  >  guest is accessing them). If it's a disabled entry, the context inside
> >>  >  userspace MMIO record is the correct one(and only one). The only place I
> >>  >  think QEmu need to sync is when MSI-X is about to disabled, QEmu need to
> >>  >  update it's own MMIO record.
> >>
> >>  So in-kernel handling of mmio would be decided per entry?  I'm trying to
> >>  simplify this, and simplest thing is - all or nothing.
> >
> >So you would like to handle all MSI-X MMIO in kernel?
> 
> Yes.  Writes to address or data would be handled by:
> - recording it into the shadow msix table
> - notifying userspace that msix entry x changed
> Reads would be handled in kernel from the shadow msix table.
> 
> So instead of
> 
> - guest reads/writes msix
> - kvm filters mmio, implements some, passes others to userspace
> 
> we have
> 
> - guest reads/writes msix
> - kvm implements all
> - some writes generate an additional notification to userspace

One small proposal in addition: since all accesses are done from guest
anyway, the shadow table can/should be stored using userspace memory,
reducing the kernel memory overhead of the feature from up to 4K per
MSIX table to just 8 bytes.

Active entries can be cached in kernel memory.

> 
> -- 
> error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-11-23 12:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  6:09 Mask bit support's API Yang, Sheng
2010-11-23  6:17 ` Avi Kivity
2010-11-23  6:35   ` Yang, Sheng
2010-11-23  7:54     ` Avi Kivity
2010-11-23  8:30       ` Yang, Sheng
2010-11-23 12:47         ` Avi Kivity
2010-11-23 12:56           ` Michael S. Tsirkin [this message]
2010-11-23 13:57           ` Yang, Sheng
2010-11-23 14:06             ` Avi Kivity
2010-11-23 15:11               ` Michael S. Tsirkin
2010-11-23 15:24                 ` Gleb Natapov
2010-11-23 16:10                   ` Michael S. Tsirkin
2010-11-24  1:59               ` Yang, Sheng
2010-11-26  2:35                 ` Yang, Sheng
2010-11-30 14:15                   ` Avi Kivity
2010-12-01  2:36                     ` Yang, Sheng
2010-12-02 13:09                       ` Avi Kivity
2010-12-02 13:47                         ` Michael S. Tsirkin
2010-12-02 13:56                           ` Avi Kivity
2010-12-02 14:26                             ` Michael S. Tsirkin
2010-12-02 14:54                               ` Sheng Yang
2010-12-02 16:55                                 ` Michael S. Tsirkin
2010-12-03  3:03                                   ` Yang, Sheng
2010-11-23 12:04 ` Michael S. Tsirkin
2010-11-23 14:02   ` Yang, Sheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101123125616.GD26313@redhat.com \
    --to=mst@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=sheng.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.