From: "Michael S. Tsirkin" <mst@redhat.com>
To: Sheng Yang <sheng@linux.intel.com>
Cc: Avi Kivity <avi@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel
Date: Wed, 29 Dec 2010 11:28:24 +0200 [thread overview]
Message-ID: <20101229092824.GA2876@redhat.com> (raw)
In-Reply-To: <201012291655.19535.sheng@linux.intel.com>
On Wed, Dec 29, 2010 at 04:55:19PM +0800, Sheng Yang wrote:
> On Wednesday 29 December 2010 16:31:35 Michael S. Tsirkin wrote:
> > On Wed, Dec 29, 2010 at 03:18:13PM +0800, Sheng Yang wrote:
> > > On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
> > > > On 12/22/2010 10:44 AM, Sheng Yang wrote:
> > > > > Then we can support mask bit operation of assigned devices now.
> > > > >
> > > > >
> > > > > @@ -3817,14 +3819,16 @@ static int
> > > > > emulator_write_emulated_onepage(unsigned long addr,
> > > > >
> > > > > mmio:
> > > > > trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
> > > > >
> > > > > + r = vcpu_mmio_write(vcpu, gpa, bytes, val);
> > > > >
> > > > > /*
> > > > >
> > > > > * Is this MMIO handled locally?
> > > > > */
> > > > >
> > > > > - if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
> > > > > + if (!r)
> > > > >
> > > > > return X86EMUL_CONTINUE;
> > > > >
> > > > > vcpu->mmio_needed = 1;
> > > > >
> > > > > - vcpu->run->exit_reason = KVM_EXIT_MMIO;
> > > > > + vcpu->run->exit_reason = (r == -ENOTSYNC) ?
> > > > > + KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
> > > >
> > > > This isn't very pretty, exit_reason should be written in
> > > > vcpu_mmio_write(). I guess we can refactor it later.
> > >
> > > Sure.
> > >
> > > > > +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV (1<< 0)
> > > > > +
> > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE (1<< 8)
> > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_PBA (1<< 9)
> > > > > +
> > > > > +#define KVM_MSIX_MMIO_TYPE_DEV_MASK 0x00ff
> > > > > +#define KVM_MSIX_MMIO_TYPE_BASE_MASK 0xff00
> > > >
> > > > Any explanation of these?
> > >
> > > I chose to use assigned device id instead of one specific table id,
> > > because every device should got at most one MSI MMIO(the same should
> > > applied to vfio device as well), and if we use specific table ID, we
> > > need way to associate with the device anyway, to perform mask/unmask or
> > > other operation. So I think it's better to use device ID here directly.
> >
> > Table id will be needed to make things work for emulated devices.
>
> I suppose even emulated device should got some kind of id(BDF)?
Not that I know. Look at how irqfd is defined for example,
or how interrupts are sent through a gsi.
I would like to make the interface be able to support that.
> I think that is
> enough for identification, which is already there, so we don't need to allocate
> another ID for the device - because one device would got at most one MSI-X MMIO,
> then use BDF or other device specific ID should be quite straightforward.
So you propose allocating ids for emulated devices?
OK. How will we map e.g. irqfds to these?
> >
> > My idea was this: we have the device id in kvm_assigned_msix_entry already.
> > Just put table id and entry number in kvm_irq_routing_entry (create
> > a new gsi type for this).
> > The result will also work for irqfd because these are mapped to gsi.
> >
> > > And for the table and pba address, it's due to the mapping in userspace
> > > may know the guest MSI-X table address and PBA address at different
> > > time(due to different BAR, refer to the code in assigned_dev_iomem_map()
> > > of qemu). So I purposed this API to allow each of them can be passed to
> > > kernel space individually.
> > >
> > > > > +struct kvm_msix_mmio_user {
> > > > > + __u32 dev_id;
> > > > > + __u16 type;
> > > > > + __u16 max_entries_nr;
> > > > > + __u64 base_addr;
> > > > > + __u64 base_va;
> > > > > + __u64 flags;
> > > > > + __u64 reserved[4];
> > > > > +};
> > > > > +
> > > > >
> > > > >
> > > > > +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
> > > > > + int assigned_dev_id, int entry, u32 flag)
> > > > > +{
> > > >
> > > > Need a better name for 'flag' (and make it a bool).
> > > >
> > > > > + int r = -EFAULT;
> > > > > + struct kvm_assigned_dev_kernel *adev;
> > > > > + int i;
> > > > > +
> > > > > + if (!irqchip_in_kernel(kvm))
> > > > > + return r;
> > > > > +
> > > > > + mutex_lock(&kvm->lock);
> > > > > + adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> > > > > + assigned_dev_id);
> > > > > + if (!adev)
> > > > > + goto out;
> > > > > +
> > > > > + for (i = 0; i< adev->entries_nr; i++)
> > > > > + if (adev->host_msix_entries[i].entry == entry) {
> > > > > + if (flag)
> > > > > + disable_irq_nosync(
> > > > > + adev->host_msix_entries[i].vector);
> > > > > + else
> > > > > + enable_irq(adev->host_msix_entries[i].vector);
> > > > > + r = 0;
> > > > > + break;
> > > > > + }
> > > > > +out:
> > > > > + mutex_unlock(&kvm->lock);
> > > > > + return r;
> > > > > +}
> > > > >
> > > > > @@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)
> > > > >
> > > > > return r;
> > > > >
> > > > > }
> > > > >
> > > > > #endif
> > > > >
> > > > > + r = kvm_register_msix_mmio_dev(kvm);
> > > > > + if (r< 0) {
> > > > > + kvm_put_kvm(kvm);
> > > > > + return r;
> > > > > + }
> > > >
> > > > Shouldn't this be part of individual KVM_REGISTER_MSIX_MMIO calls?
> > >
> > > In fact this MMIO device is more like global one for the VM, not for
> > > every devices. It should handle all MMIO from all MSI-X enabled devices,
> > > so I put it in the VM init/destroy process.
> > >
> > > > > +static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t
> > > > > addr, int len, + void *val)
> > > > > +{
> > > > > + struct kvm_msix_mmio_dev *mmio_dev =
> > > > > + container_of(this, struct kvm_msix_mmio_dev, table_dev);
> > > > > + struct kvm_msix_mmio *mmio;
> > > > > + int idx, ret = 0, entry, offset, r;
> > > > > +
> > > > > + mutex_lock(&mmio_dev->lock);
> > > > > + idx = get_mmio_table_index(mmio_dev, addr, len);
> > > > > + if (idx< 0) {
> > > > > + ret = -EOPNOTSUPP;
> > > > > + goto out;
> > > > > + }
> > > > > + if ((addr& 0x3) || (len != 4&& len != 8))
> > > > > + goto out;
> > > >
> > > > What about (addr & 4) && (len == 8)? Is it supported? It may cross
> > > > entry boundaries.
> > >
> > > Should not supported. But I haven't found words on the PCI spec for it.
> > > So I didn't add this check.
> >
> > IMPLEMENTATION NOTE
> > MSI-X Memory Space Structures in Read/Write Memory
> >
> > ....
> >
> > For all accesses to MSI-X Table and MSI-X PBA fields, software must use
> > aligned full
> > DWORD or aligned full QWORD transactions; otherwise, the result is
> > undefined.
>
> Yes, this one is enough, I would add the checking.
> >
> > > > > + mmio =&mmio_dev->mmio[idx];
> > > > > +
> > > > > + entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > > > > + offset = addr& 0xf;
> > > > > + r = copy_from_user(val, (void *)(mmio->table_base_va +
> > > > > + entry * PCI_MSIX_ENTRY_SIZE + offset), len);
> > > > >
> > > > >
> > > > > + if (r)
> > > > > + goto out;
> > > > > +out:
> > > > > + mutex_unlock(&mmio_dev->lock);
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > +static int msix_table_mmio_write(struct kvm_io_device *this, gpa_t
> > > > > addr, + int len, const void *val)
> > > > > +{
> > > > > + struct kvm_msix_mmio_dev *mmio_dev =
> > > > > + container_of(this, struct kvm_msix_mmio_dev, table_dev);
> > > > > + struct kvm_msix_mmio *mmio;
> > > > > + int idx, entry, offset, ret = 0, r = 0;
> > > > > + gpa_t entry_base;
> > > > > + u32 old_ctrl, new_ctrl;
> > > > > +
> > > > > + mutex_lock(&mmio_dev->lock);
> > > > > + idx = get_mmio_table_index(mmio_dev, addr, len);
> > > > > + if (idx< 0) {
> > > > > + ret = -EOPNOTSUPP;
> > > > > + goto out;
> > > > > + }
> > > > > + if ((addr& 0x3) || (len != 4&& len != 8))
> > > > > + goto out;
> > > > > + mmio =&mmio_dev->mmio[idx];
> > > > > + entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > > > > + entry_base = mmio->table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
> > > > > + offset = addr& 0xF;
> > > > > +
> > > > > + if (copy_from_user(&old_ctrl,
> > > > > + entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > > > > + sizeof old_ctrl))
> > > > > + goto out;
> > > >
> > > > get_user() is easier.
> > > >
> > > > > +
> > > > > + /* No allow writing to other fields when entry is unmasked */
> > > > > + if (!(old_ctrl& PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > > > > + offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
> > > > > + goto out;
> > > > > +
> > > > > + if (copy_to_user(entry_base + offset, val, len))
> > > > > + goto out;
> > > > >
> > > > > +
> > > > > + if (copy_from_user(&new_ctrl,
> > > > > + entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > > > > + sizeof new_ctrl))
> > > > > + goto out;
> > > >
> > > > put_user()
> > > >
> > > > > +
> > > > > + if ((offset< PCI_MSIX_ENTRY_VECTOR_CTRL&& len == 4) ||
> > > > > + (offset< PCI_MSIX_ENTRY_DATA&& len == 8))
> > > > > + ret = -ENOTSYNC;
> > > > > + if (old_ctrl == new_ctrl)
> > > > > + goto out;
> > > > > + if (!(old_ctrl& PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > > > > + (new_ctrl& PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > > + r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 1);
> > > > > + else if ((old_ctrl& PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > > > > + !(new_ctrl& PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > > + r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 0);
> > > > > + if (r || ret)
> > > > > + ret = -ENOTSYNC;
> > > > > +out:
> > > > > + mutex_unlock(&mmio_dev->lock);
> > > > > + return ret;
> > > > > +}
> > > >
> > > > blank line...
> > > >
> > > > > +static const struct kvm_io_device_ops msix_mmio_table_ops = {
> > > > > + .read = msix_table_mmio_read,
> > > > > + .write = msix_table_mmio_write,
> > > > > +};
> > > > > +
> > > > > ++
> > > > > +int kvm_vm_ioctl_register_msix_mmio(struct kvm *kvm,
> > > > > + struct kvm_msix_mmio_user *mmio_user)
> > > > > +{
> > > > > + struct kvm_msix_mmio_dev *mmio_dev =&kvm->msix_mmio_dev;
> > > > > + struct kvm_msix_mmio *mmio = NULL;
> > > > > + int r = 0, i;
> > > > > +
> > > > > + mutex_lock(&mmio_dev->lock);
> > > > > + for (i = 0; i< mmio_dev->mmio_nr; i++) {
> > > > > + if (mmio_dev->mmio[i].dev_id == mmio_user->dev_id&&
> > > > > + (mmio_dev->mmio[i].type& KVM_MSIX_MMIO_TYPE_DEV_MASK) ==
> > > > > + (mmio_user->type& KVM_MSIX_MMIO_TYPE_DEV_MASK)) {
> > > > > + mmio =&mmio_dev->mmio[i];
> > > > > + if (mmio->max_entries_nr != mmio_user->max_entries_nr) {
> > > > > + r = -EINVAL;
> > > > > + goto out;
> > > > > + }
> > > > > + break;
> > > > > + }
> > > > > + }
> > > > > + if (!mmio) {
> > > > > + if (mmio_dev->mmio_nr == KVM_MSIX_MMIO_MAX) {
> > > > > + r = -ENOSPC;
> > > > > + goto out;
> > > > > + }
> > > > > + mmio =&mmio_dev->mmio[mmio_dev->mmio_nr];
> > > > > + mmio_dev->mmio_nr++;
> > > > > + }
> > > > > + mmio->max_entries_nr = mmio_user->max_entries_nr;
> > > >
> > > > Sanity check to avoid overflow.
> > > >
> > > > > + mmio->dev_id = mmio_user->dev_id;
> > > > > + mmio->flags = mmio_user->flags;
> > > >
> > > > Check for unsupported bits (all of them at present?)
> > > >
> > > > > + if ((mmio_user->type& KVM_MSIX_MMIO_TYPE_DEV_MASK) ==
> > > > > + KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV)
> > > > > + mmio->type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV;
> > > > > +
> > > > > + if ((mmio_user->type& KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > > > > + KVM_MSIX_MMIO_TYPE_BASE_TABLE) {
> > > > > + mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_TABLE;
> > > > > + mmio->table_base_addr = mmio_user->base_addr;
> > > > > + mmio->table_base_va = mmio_user->base_va;
> > > >
> > > > Check for va in kernel space.
> > > >
> > > > > + } else if ((mmio_user->type& KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > > > > + KVM_MSIX_MMIO_TYPE_BASE_PBA) {
> > > > > + mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_PBA;
> > > > > + mmio->pba_base_addr = mmio_user->base_addr;
> > > > > + mmio->pba_base_va = mmio_user->base_va;
> > > > > + }
> > > > > +out:
> > > > > + mutex_unlock(&mmio_dev->lock);
> > > > > + return r;
> > > > > +}
> > > > > +
> > > > > +
> > > >
> > > > In all, looks reasonable. I'd like to see documentation for it, and
> > > > review from the pci people. Alex, mst?
> >
> > Some general comments:
> > PBA isn't supported in this version, which is OK, but let's not add a
> > capability until it is, and let's not try to guess what
> > the interface will look like. I think keeping PBA in userspace will be hard
> > because it needs to be modified from interrupt context.
> > Removing the PBA stub will make the interface simpler.
>
> The API only get the PBA address now which should be fine. And we still have
> threaded irq and tasklet for accessing the userspace for interrupt handler...
I don't think it's going to work: we are not
in the context of the right process. Further
I think we should keep the option of
reading the PBA status from the device or host kernel open.
And generally having an interface
for functionality we don't implement is not a good idea:
you don't know whether you really can support the interface you promised.
> --
> regards
> Yang, Sheng
>
> > > Would add the API document soon.
> > >
> > > --
> > > regards
> > > Yang, Sheng
next prev parent reply other threads:[~2010-12-29 9:28 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-22 8:44 [PATCH 0/2 v6] MSI-X mask bit support for KVM Sheng Yang
2010-12-22 8:44 ` [PATCH 1/2] KVM: Move struct kvm_io_device to kvm_host.h Sheng Yang
2010-12-22 8:44 ` [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel Sheng Yang
2010-12-28 12:26 ` Avi Kivity
2010-12-29 7:18 ` Sheng Yang
2010-12-29 8:31 ` Michael S. Tsirkin
2010-12-29 8:55 ` Sheng Yang
2010-12-29 9:28 ` Michael S. Tsirkin [this message]
2010-12-30 7:32 ` Sheng Yang
2010-12-30 7:47 ` Michael S. Tsirkin
2010-12-30 7:55 ` Sheng Yang
2010-12-30 8:15 ` Michael S. Tsirkin
2010-12-30 8:24 ` Sheng Yang
2010-12-30 8:52 ` Michael S. Tsirkin
2010-12-30 9:13 ` Sheng Yang
2010-12-30 9:30 ` Avi Kivity
2010-12-30 10:32 ` Michael S. Tsirkin
2010-12-30 10:37 ` Avi Kivity
2010-12-30 11:07 ` Michael S. Tsirkin
2010-12-30 11:27 ` Avi Kivity
2010-12-30 12:17 ` Michael S. Tsirkin
2010-12-31 3:05 ` Sheng Yang
2011-01-02 9:26 ` Michael S. Tsirkin
2011-01-02 10:26 ` Avi Kivity
2011-01-02 10:39 ` Michael S. Tsirkin
2011-01-02 10:58 ` Avi Kivity
2011-01-02 11:51 ` Michael S. Tsirkin
2011-01-02 13:34 ` Avi Kivity
2011-01-02 13:57 ` Michael S. Tsirkin
2010-12-30 9:28 ` Avi Kivity
2010-12-30 10:03 ` Michael S. Tsirkin
2010-12-28 4:05 ` [PATCH 0/2 v6] MSI-X mask bit support for KVM Sheng Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101229092824.GA2876@redhat.com \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=sheng@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox