All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yang, Sheng" <sheng.yang@intel.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: Mask bit support's API
Date: Tue, 23 Nov 2010 21:57:05 +0800	[thread overview]
Message-ID: <201011232157.05130.sheng.yang@intel.com> (raw)
In-Reply-To: <4CEBB7E5.5030601@redhat.com>

On Tuesday 23 November 2010 20:47:33 Avi Kivity wrote:
> On 11/23/2010 10:30 AM, Yang, Sheng wrote:
> > On Tuesday 23 November 2010 15:54:40 Avi Kivity wrote:
> > >  On 11/23/2010 08:35 AM, Yang, Sheng wrote:
> > >  >  On Tuesday 23 November 2010 14:17:28 Avi Kivity wrote:
> > >  >  >   On 11/23/2010 08:09 AM, Yang, Sheng wrote:
> > >  >  >   >   Hi Avi,
> > >  >  >   >   
> > >  >  >   >   I've purposed the following API for mask bit support.
> > >  >  >   >   
> > >  >  >   >   The main point is, QEmu can know which entries are
> > >  >  >   >   enabled(by pci_enable_msix()). And for enabled entries,
> > >  >  >   >   kernel own it, including MSI data/address and mask
> > >  >  >   >   bit(routing table and mask bitmap). QEmu should use
> > >  >  >   >   KVM_GET_MSIX_ENTRY ioctl to get them(and it can sync with
> > >  >  >   >   them if it want to do so).
> > >  >  >   >   
> > >  >  >   >   Before entries are enabled, QEmu can still use it's own MSI
> > >  >  >   >   table(because we didn't contain these kind of information
> > >  >  >   >   in kernel, and it's unnecessary for kernel).
> > >  >  >   >   
> > >  >  >   >   The KVM_MSIX_FLAG_ENTRY flag would be clear if QEmu want to
> > >  >  >   >   query one entry didn't exist in kernel - or we can simply
> > >  >  >   >   return -EINVAL for it.
> > >  >  >   >   
> > >  >  >   >   I suppose it would be rare for QEmu to use this interface
> > >  >  >   >   to get the context of entry(the only case I think is when
> > >  >  >   >   MSI-X disable and QEmu need to sync the context), so
> > >  >  >   >   performance should not be an issue.
> > >  >  >   >   
> > >  >  >   >   What's your opinion?
> > >  >  >   >   
> > >  >  >   >   >    #define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d,
> > >  >  >   >   >    struct kvm_msix_entry)
> > >  >  >   
> > >  >  >   Need SET_MSIX_ENTRY for live migration as well.
> > >  >  
> > >  >  Current we don't support LM with VT-d...
> > >  
> > >  Isn't this work useful for virtio as well?
> > 
> > Yeah, but won't be included in this patchset.
> 
> What API changes are needed?  I'd like to see the complete API.

I am not sure about it. But I suppose the structure should be the same? In fact 
it's pretty hard for me to image what's needed for virtio in the future, 
especially there is no such code now. I really prefer to deal with assigned device 
and virtio separately, which would make the work much easier. But seems you won't 
agree on that.

> 
> > >  >  >   What about the pending bits?
> > >  >  
> > >  >  We didn't cover it here - and it's in another MMIO space(PBA). Of
> > >  >  course we can add more flags for it later.
> > >  
> > >  When an entry is masked, we need to set the pending bit for it
> > >  somewhere.  I guess this is broken in the existing code (without your
> > >  patches)?
> > 
> > Even with my patch, we didn't support the pending bit. It would always
> > return 0 now. What we supposed to do(after my patch checked in) is to
> > check IRQ_PENDING flag of irq_desc->status(if the entry is masked), and
> > return the result to userspace.
> > 
> > That would involve some core change, like to export irq_to_desc(). I
> > don't think it would be accepted soon, so would push mask bit first.
> 
> The API needs to be compatible with the pending bit, even if we don't
> implement it now.  I want to reduce the rate of API changes.

This can be implemented by this API, just adding a flag for it. And I would still 
take this into consideration in the next API purposal.
 
> > >  >  >   Also need a new exit reason to tell userspace that an msix
> > >  >  >   entry has changed, so userspace can update mappings.
> > >  >  
> > >  >  I think we don't need it. Whenever userspace want to get one
> > >  >  mapping which is an enabled MSI-X entry, it can check it with the
> > >  >  API above(which is quite rare, because kernel would handle all of
> > >  >  them when guest is accessing them). If it's a disabled entry, the
> > >  >  context inside userspace MMIO record is the correct one(and only
> > >  >  one). The only place I think QEmu need to sync is when MSI-X is
> > >  >  about to disabled, QEmu need to update it's own MMIO record.
> > >  
> > >  So in-kernel handling of mmio would be decided per entry?  I'm trying
> > >  to simplify this, and simplest thing is - all or nothing.
> > 
> > So you would like to handle all MSI-X MMIO in kernel?
> 
> Yes.  Writes to address or data would be handled by:
> - recording it into the shadow msix table
> - notifying userspace that msix entry x changed
> Reads would be handled in kernel from the shadow msix table.
> 
> So instead of
> 
> - guest reads/writes msix
> - kvm filters mmio, implements some, passes others to userspace
> 
> we have
> 
> - guest reads/writes msix
> - kvm implements all
> - some writes generate an additional notification to userspace

I suppose we don't need to generate notification to userspace? Because every 
read/write is handled by kernel, and userspace just need interface to kernel to 
get/set the entry - and well, does userspace need to do it when kernel can handle 
all of them? Maybe not...

--
regards
Yang, Sheng

  parent reply	other threads:[~2010-11-23 13:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  6:09 Mask bit support's API Yang, Sheng
2010-11-23  6:17 ` Avi Kivity
2010-11-23  6:35   ` Yang, Sheng
2010-11-23  7:54     ` Avi Kivity
2010-11-23  8:30       ` Yang, Sheng
2010-11-23 12:47         ` Avi Kivity
2010-11-23 12:56           ` Michael S. Tsirkin
2010-11-23 13:57           ` Yang, Sheng [this message]
2010-11-23 14:06             ` Avi Kivity
2010-11-23 15:11               ` Michael S. Tsirkin
2010-11-23 15:24                 ` Gleb Natapov
2010-11-23 16:10                   ` Michael S. Tsirkin
2010-11-24  1:59               ` Yang, Sheng
2010-11-26  2:35                 ` Yang, Sheng
2010-11-30 14:15                   ` Avi Kivity
2010-12-01  2:36                     ` Yang, Sheng
2010-12-02 13:09                       ` Avi Kivity
2010-12-02 13:47                         ` Michael S. Tsirkin
2010-12-02 13:56                           ` Avi Kivity
2010-12-02 14:26                             ` Michael S. Tsirkin
2010-12-02 14:54                               ` Sheng Yang
2010-12-02 16:55                                 ` Michael S. Tsirkin
2010-12-03  3:03                                   ` Yang, Sheng
2010-11-23 12:04 ` Michael S. Tsirkin
2010-11-23 14:02   ` Yang, Sheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201011232157.05130.sheng.yang@intel.com \
    --to=sheng.yang@intel.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.