From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 2/2][RFC] KVM: Emulate MSI-X table and PBA in kernel
Date: Sun, 2 Jan 2011 13:51:35 +0200
Message-ID: <20110102115135.GA712@redhat.com>
References: <1293007495-32325-1-git-send-email-sheng@linux.intel.com>
 <4D1C5124.2090409@redhat.com>
 <20101230103256.GB6441@redhat.com>
 <201012311105.28371.sheng@linux.intel.com>
 <4D2052C3.3020901@redhat.com>
 <20110102103928.GA32272@redhat.com>
 <4D205A6A.10900@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Sheng Yang <sheng@linux.intel.com>,
	Marcelo Tosatti <mtosatti@redhat.com>, kvm@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:44041 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751093Ab1ABLxK (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sun, 2 Jan 2011 06:53:10 -0500
Content-Disposition: inline
In-Reply-To: <4D205A6A.10900@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Sun, Jan 02, 2011 at 12:58:50PM +0200, Avi Kivity wrote:
> On 01/02/2011 12:39 PM, Michael S. Tsirkin wrote:
> >>  >
> >>  >I agree. At least it's not a regression. And in fact we haven't seen any device
> >>  >driver use this. I've checked Linux kernel code, found no one used PCI_MSIX_PBA or
> >>  >msix_pba_offset_reg().
> >>  >
> >>  >I guess it's fine to get MSI-X mask part in first, then deal with PBA part if
> >>  >necessary - though we haven't seen any driver use it so far. It won't be worse
> >>  >with this patch anyway...
> >>
> >>  In a way it is worse because before, the fix would belong in user
> >>  space, which is easier to test and distribute.  Now we have to fix
> >>  it in the kernel.
> >>
> >>  However I recognize that drivers which rely on the pending bit are
> >>  rare/nonexistent (likely on in preboot environments where interrupts
> >>  are hard), so even if we do code it, it will likely be incorrect
> >>  (certainly without a test).
> >>
> >>  So I'll accept the patch without PBA.  Michael, what about
> >>  supporting virtio?  Can we base something on this patch?
> >
> >I don't see how userspace can send interrupts with this
> >interface unfortunately. We also need irqfd support ...
> 
> Sure we'll need additions to that interface.

What I suggested is 
1. an ioctl to map phy address + size to table id
2. a new gsi type with a table id + entry number.

If we have that, assigned devices, virtio and vhost-net can work
mostly as is, with just the mask bits accelerated.

> What about vhost-net and vfio?  I thought that they could emulate
> the mask bits:
> 
> - KVM_MMIOFD(vmfd, mmio_range, fd1, fd2) associates an mmio range with an fd
> - writel(mmio_range) or readl(mmio_range) from the guest causes a
> command to be written to fd1
> - for readl(), read from fd2 to see the result (works nicely for
> "pci read flushes posted writes")
> 
> this allows interesting stuff to be implemented in separate
> processes, threads, or kernel modules.

This could work. Some thought needs to be given to how we make sure that
an appropriate type of file is passed in. Maybe using a netlink
based connector for this a good idea?

OTOH if we have MSIX mask bit emulation in kvm anyway, using it makes
sense ...

> -- 
> error compiling committee.c: too many arguments to function