public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sheng Yang <sheng@linux.intel.com>
To: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel
Date: Wed, 29 Dec 2010 15:18:13 +0800	[thread overview]
Message-ID: <201012291518.13449.sheng@linux.intel.com> (raw)
In-Reply-To: <4D19D765.7080507@redhat.com>

On Tuesday 28 December 2010 20:26:13 Avi Kivity wrote:
> On 12/22/2010 10:44 AM, Sheng Yang wrote:
> > Then we can support mask bit operation of assigned devices now.
> > 
> > 
> > @@ -3817,14 +3819,16 @@ static int
> > emulator_write_emulated_onepage(unsigned long addr,
> > 
> >   mmio:
> >   	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
> > 
> > +	r = vcpu_mmio_write(vcpu, gpa, bytes, val);
> > 
> >   	/*
> >   	
> >   	 * Is this MMIO handled locally?
> >   	 */
> > 
> > -	if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
> > +	if (!r)
> > 
> >   		return X86EMUL_CONTINUE;
> >   	
> >   	vcpu->mmio_needed = 1;
> > 
> > -	vcpu->run->exit_reason = KVM_EXIT_MMIO;
> > +	vcpu->run->exit_reason = (r == -ENOTSYNC) ?
> > +		KVM_EXIT_MSIX_ROUTING_UPDATE : KVM_EXIT_MMIO;
> 
> This isn't very pretty, exit_reason should be written in
> vcpu_mmio_write().  I guess we can refactor it later.

Sure.
> 
> > +#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV	    (1<<  0)
> > +
> > +#define KVM_MSIX_MMIO_TYPE_BASE_TABLE	    (1<<  8)
> > +#define KVM_MSIX_MMIO_TYPE_BASE_PBA	    (1<<  9)
> > +
> > +#define KVM_MSIX_MMIO_TYPE_DEV_MASK	    0x00ff
> > +#define KVM_MSIX_MMIO_TYPE_BASE_MASK	    0xff00
> 
> Any explanation of these?

I chose to use assigned device id instead of one specific table id, because every 
device should got at most one MSI MMIO(the same should applied to vfio device as 
well), and if we use specific table ID, we need way to associate with the device 
anyway, to perform mask/unmask or other operation. So I think it's better to use 
device ID here directly. 

And for the table and pba address, it's due to the mapping in userspace may know 
the guest MSI-X table address and PBA address at different time(due to different 
BAR, refer to the code in assigned_dev_iomem_map() of qemu). So I purposed this 
API to allow each of them can be passed to kernel space individually.
> 
> > +struct kvm_msix_mmio_user {
> > +	__u32 dev_id;
> > +	__u16 type;
> > +	__u16 max_entries_nr;
> > +	__u64 base_addr;
> > +	__u64 base_va;
> > +	__u64 flags;
> > +	__u64 reserved[4];
> > +};
> > +
> > 
> > 
> > +int kvm_assigned_device_update_msix_mask_bit(struct kvm *kvm,
> > +				int assigned_dev_id, int entry, u32 flag)
> > +{
> 
> Need a better name for 'flag' (and make it a bool).
> 
> > +	int r = -EFAULT;
> > +	struct kvm_assigned_dev_kernel *adev;
> > +	int i;
> > +
> > +	if (!irqchip_in_kernel(kvm))
> > +		return r;
> > +
> > +	mutex_lock(&kvm->lock);
> > +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> > +				      assigned_dev_id);
> > +	if (!adev)
> > +		goto out;
> > +
> > +	for (i = 0; i<  adev->entries_nr; i++)
> > +		if (adev->host_msix_entries[i].entry == entry) {
> > +			if (flag)
> > +				disable_irq_nosync(
> > +					adev->host_msix_entries[i].vector);
> > +			else
> > +				enable_irq(adev->host_msix_entries[i].vector);
> > +			r = 0;
> > +			break;
> > +		}
> > +out:
> > +	mutex_unlock(&kvm->lock);
> > +	return r;
> > +}
> > 
> > @@ -1988,6 +2008,12 @@ static int kvm_dev_ioctl_create_vm(void)
> > 
> >   		return r;
> >   	
> >   	}
> >   
> >   #endif
> > 
> > +	r = kvm_register_msix_mmio_dev(kvm);
> > +	if (r<  0) {
> > +		kvm_put_kvm(kvm);
> > +		return r;
> > +	}
> 
> Shouldn't this be part of individual KVM_REGISTER_MSIX_MMIO calls?

In fact this MMIO device is more like global one for the VM, not for every 
devices. It should handle all MMIO from all MSI-X enabled devices, so I put it in 
the VM init/destroy process.

> > +static int msix_table_mmio_read(struct kvm_io_device *this, gpa_t addr,
> > int len, +				void *val)
> > +{
> > +	struct kvm_msix_mmio_dev *mmio_dev =
> > +		container_of(this, struct kvm_msix_mmio_dev, table_dev);
> > +	struct kvm_msix_mmio *mmio;
> > +	int idx, ret = 0, entry, offset, r;
> > +
> > +	mutex_lock(&mmio_dev->lock);
> > +	idx = get_mmio_table_index(mmio_dev, addr, len);
> > +	if (idx<  0) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	if ((addr&  0x3) || (len != 4&&  len != 8))
> > +		goto out;
> 
> What about (addr & 4) && (len == 8)? Is it supported? It may cross entry
> boundaries.

Should not supported. But I haven't found words on the PCI spec for it. So I 
didn't add this check.
> 
> > +	mmio =&mmio_dev->mmio[idx];
> > +
> > +	entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > +	offset = addr&  0xf;
> > +	r = copy_from_user(val, (void *)(mmio->table_base_va +
> > +			entry * PCI_MSIX_ENTRY_SIZE + offset), len);
> > 
> > 
> > +	if (r)
> > +		goto out;
> > +out:
> > +	mutex_unlock(&mmio_dev->lock);
> > +	return ret;
> > +}
> > +
> > +static int msix_table_mmio_write(struct kvm_io_device *this, gpa_t addr,
> > +				int len, const void *val)
> > +{
> > +	struct kvm_msix_mmio_dev *mmio_dev =
> > +		container_of(this, struct kvm_msix_mmio_dev, table_dev);
> > +	struct kvm_msix_mmio *mmio;
> > +	int idx, entry, offset, ret = 0, r = 0;
> > +	gpa_t entry_base;
> > +	u32 old_ctrl, new_ctrl;
> > +
> > +	mutex_lock(&mmio_dev->lock);
> > +	idx = get_mmio_table_index(mmio_dev, addr, len);
> > +	if (idx<  0) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	if ((addr&  0x3) || (len != 4&&  len != 8))
> > +		goto out;
> > +	mmio =&mmio_dev->mmio[idx];
> > +	entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > +	entry_base = mmio->table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
> > +	offset = addr&  0xF;
> > +
> > +	if (copy_from_user(&old_ctrl,
> > +			entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > +			sizeof old_ctrl))
> > +		goto out;
> 
> get_user() is easier.
> 
> > +
> > +	/* No allow writing to other fields when entry is unmasked */
> > +	if (!(old_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > +	    offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
> > +		goto out;
> > +
> > +	if (copy_to_user(entry_base + offset, val, len))
> > +		goto out;
> > 
> > +
> > +	if (copy_from_user(&new_ctrl,
> > +			entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL,
> > +			sizeof new_ctrl))
> > +		goto out;
> 
> put_user()
> 
> > +
> > +	if ((offset<  PCI_MSIX_ENTRY_VECTOR_CTRL&&  len == 4) ||
> > +	    (offset<  PCI_MSIX_ENTRY_DATA&&  len == 8))
> > +		ret = -ENOTSYNC;
> > +	if (old_ctrl == new_ctrl)
> > +		goto out;
> > +	if (!(old_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > +			(new_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 1);
> > +	else if ((old_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT)&&
> > +			!(new_ctrl&  PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 0);
> > +	if (r || ret)
> > +		ret = -ENOTSYNC;
> > +out:
> > +	mutex_unlock(&mmio_dev->lock);
> > +	return ret;
> > +}
> 
> blank line...
> 
> > +static const struct kvm_io_device_ops msix_mmio_table_ops = {
> > +	.read     = msix_table_mmio_read,
> > +	.write    = msix_table_mmio_write,
> > +};
> > +
> > ++
> > +int kvm_vm_ioctl_register_msix_mmio(struct kvm *kvm,
> > +				    struct kvm_msix_mmio_user *mmio_user)
> > +{
> > +	struct kvm_msix_mmio_dev *mmio_dev =&kvm->msix_mmio_dev;
> > +	struct kvm_msix_mmio *mmio = NULL;
> > +	int r = 0, i;
> > +
> > +	mutex_lock(&mmio_dev->lock);
> > +	for (i = 0; i<  mmio_dev->mmio_nr; i++) {
> > +		if (mmio_dev->mmio[i].dev_id == mmio_user->dev_id&&
> > +		    (mmio_dev->mmio[i].type&  KVM_MSIX_MMIO_TYPE_DEV_MASK) ==
> > +		    (mmio_user->type&  KVM_MSIX_MMIO_TYPE_DEV_MASK)) {
> > +			mmio =&mmio_dev->mmio[i];
> > +			if (mmio->max_entries_nr != mmio_user->max_entries_nr) {
> > +				r = -EINVAL;
> > +				goto out;
> > +			}
> > +			break;
> > +		}
> > +	}
> > +	if (!mmio) {
> > +		if (mmio_dev->mmio_nr == KVM_MSIX_MMIO_MAX) {
> > +			r = -ENOSPC;
> > +			goto out;
> > +		}
> > +		mmio =&mmio_dev->mmio[mmio_dev->mmio_nr];
> > +		mmio_dev->mmio_nr++;
> > +	}
> > +	mmio->max_entries_nr = mmio_user->max_entries_nr;
> 
> Sanity check to avoid overflow.
> 
> > +	mmio->dev_id = mmio_user->dev_id;
> > +	mmio->flags = mmio_user->flags;
> 
> Check for unsupported bits (all of them at present?)
> 
> > +	if ((mmio_user->type&  KVM_MSIX_MMIO_TYPE_DEV_MASK) ==
> > +			KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV)
> > +		mmio->type = KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV;
> > +
> > +	if ((mmio_user->type&  KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > +			KVM_MSIX_MMIO_TYPE_BASE_TABLE) {
> > +		mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_TABLE;
> > +		mmio->table_base_addr = mmio_user->base_addr;
> > +		mmio->table_base_va = mmio_user->base_va;
> 
> Check for va in kernel space.
> 
> > +	} else if ((mmio_user->type&  KVM_MSIX_MMIO_TYPE_BASE_MASK) ==
> > +			KVM_MSIX_MMIO_TYPE_BASE_PBA) {
> > +		mmio->type |= KVM_MSIX_MMIO_TYPE_BASE_PBA;
> > +		mmio->pba_base_addr = mmio_user->base_addr;
> > +		mmio->pba_base_va = mmio_user->base_va;
> > +	}
> > +out:
> > +	mutex_unlock(&mmio_dev->lock);
> > +	return r;
> > +}
> > +
> > +
> 
> In all, looks reasonable.  I'd like to see documentation for it, and
> review from the pci people.  Alex, mst?

Would add the API document soon.

--
regards
Yang, Sheng

  reply	other threads:[~2010-12-29  7:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-22  8:44 [PATCH 0/2 v6] MSI-X mask bit support for KVM Sheng Yang
2010-12-22  8:44 ` [PATCH 1/2] KVM: Move struct kvm_io_device to kvm_host.h Sheng Yang
2010-12-22  8:44 ` [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel Sheng Yang
2010-12-28 12:26   ` Avi Kivity
2010-12-29  7:18     ` Sheng Yang [this message]
2010-12-29  8:31       ` Michael S. Tsirkin
2010-12-29  8:55         ` Sheng Yang
2010-12-29  9:28           ` Michael S. Tsirkin
2010-12-30  7:32             ` Sheng Yang
2010-12-30  7:47               ` Michael S. Tsirkin
2010-12-30  7:55                 ` Sheng Yang
2010-12-30  8:15                   ` Michael S. Tsirkin
2010-12-30  8:24                     ` Sheng Yang
2010-12-30  8:52                       ` Michael S. Tsirkin
2010-12-30  9:13                         ` Sheng Yang
2010-12-30  9:30                 ` Avi Kivity
2010-12-30 10:32                   ` Michael S. Tsirkin
2010-12-30 10:37                     ` Avi Kivity
2010-12-30 11:07                       ` Michael S. Tsirkin
2010-12-30 11:27                         ` Avi Kivity
2010-12-30 12:17                           ` Michael S. Tsirkin
2010-12-31  3:05                     ` Sheng Yang
2011-01-02  9:26                       ` Michael S. Tsirkin
2011-01-02 10:26                       ` Avi Kivity
2011-01-02 10:39                         ` Michael S. Tsirkin
2011-01-02 10:58                           ` Avi Kivity
2011-01-02 11:51                             ` Michael S. Tsirkin
2011-01-02 13:34                               ` Avi Kivity
2011-01-02 13:57                                 ` Michael S. Tsirkin
2010-12-30  9:28               ` Avi Kivity
2010-12-30 10:03                 ` Michael S. Tsirkin
2010-12-28  4:05 ` [PATCH 0/2 v6] MSI-X mask bit support for KVM Sheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201012291518.13449.sheng@linux.intel.com \
    --to=sheng@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox