From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH v3] KVM: Allow host IRQ sharing for assigned PCI 2.3 devices Date: Tue, 28 Feb 2012 02:15:23 +0100 Message-ID: <4F4C2AAB.1060001@web.de> References: <4F355F4A.3010507@siemens.com> <1330376747.16474.329.camel@bling.home> <4F4BFE9E.2090900@web.de> <1330384513.29701.52.camel@bling.home> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig02698C98A630F7DC2A73F7AA" Cc: Avi Kivity , Marcelo Tosatti , kvm , "Michael S. Tsirkin" To: Alex Williamson Return-path: Received: from fmmailgate06.web.de ([217.72.192.247]:42773 "EHLO fmmailgate06.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965179Ab2B1BYW (ORCPT ); Mon, 27 Feb 2012 20:24:22 -0500 Received: from moweb001.kundenserver.de (moweb001.kundenserver.de [172.19.20.114]) by fmmailgate06.web.de (Postfix) with ESMTP id BB612D2A4A0 for ; Tue, 28 Feb 2012 02:15:36 +0100 (CET) In-Reply-To: <1330384513.29701.52.camel@bling.home> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig02698C98A630F7DC2A73F7AA Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2012-02-28 00:15, Alex Williamson wrote: > On Mon, 2012-02-27 at 23:07 +0100, Jan Kiszka wrote: >> On 2012-02-27 22:05, Alex Williamson wrote: >>> On Fri, 2012-02-10 at 19:17 +0100, Jan Kiszka wrote: >>>> PCI 2.3 allows to generically disable IRQ sources at device level. T= his >>>> enables us to share legacy IRQs of such devices with other host devi= ces >>>> when passing them to a guest. >>>> >>>> The new IRQ sharing feature introduced here is optional, user space = has >>>> to request it explicitly. Moreover, user space can inform us about i= ts >>>> view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the >>>> interrupt and signaling it if the guest masked it via the virtualize= d >>>> PCI config space. >>>> >>>> Signed-off-by: Jan Kiszka >>>> --- >>>> >>>> Changes in v3: >>>> - rebased over current kvm.git (no code conflict, just api.txt) >>>> >>>> Documentation/virtual/kvm/api.txt | 31 ++++++ >>>> arch/x86/kvm/x86.c | 1 + >>>> include/linux/kvm.h | 6 + >>>> include/linux/kvm_host.h | 2 + >>>> virt/kvm/assigned-dev.c | 208 ++++++++++++++++++++++++++= +++++----- >>>> 5 files changed, 219 insertions(+), 29 deletions(-) >>>> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtu= al/kvm/api.txt >>>> index 59a3826..5ce0e29 100644 >>>> --- a/Documentation/virtual/kvm/api.txt >>>> +++ b/Documentation/virtual/kvm/api.txt >>>> @@ -1169,6 +1169,14 @@ following flags are specified: >>>> =20 >>>> /* Depends on KVM_CAP_IOMMU */ >>>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) >>>> +/* The following two depend on KVM_CAP_PCI_2_3 */ >>>> +#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) >>>> +#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) >>>> + >>>> +If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INT= x interrupts >>>> +via the PCI-2.3-compliant device-level mask, thus enable IRQ sharin= g with other >>>> +assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifie= s the >>>> +guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for det= ails. >>>> =20 >>>> The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensur= e >>>> isolation of the device. Usages not specifying this flag are depre= cated. >>>> @@ -1441,6 +1449,29 @@ The "num_dirty" field is a performance hint f= or KVM to determine whether it >>>> should skip processing the bitmap and just invalidate everything. = It must >>>> be set to the number of set bits in the bitmap. >>>> =20 >>>> +4.60 KVM_ASSIGN_SET_INTX_MASK >>>> + >>>> +Capability: KVM_CAP_PCI_2_3 >>>> +Architectures: x86 >>>> +Type: vm ioctl >>>> +Parameters: struct kvm_assigned_pci_dev (in) >>>> +Returns: 0 on success, -1 on error >>>> + >>>> +Informs the kernel about the guest's view on the INTx mask. As long= as the >>>> +guest masks the legacy INTx, the kernel will refrain from unmasking= it at >>>> +hardware level and will not assert the guest's IRQ line. User space= is still >>>> +responsible for applying this state to the assigned device's real c= onfig space >>>> +by setting or clearing the Interrupt Disable bit 10 in the Command = register. >>>> + >>>> +To avoid that the kernel overwrites the state user space wants to s= et, >>>> +KVM_ASSIGN_SET_INTX_MASK has to be called prior to updating the con= fig space. >>>> +Moreover, user space has to write back its own view on the Interrup= t Disable >>>> +bit whenever modifying the Command word. >>> >>> This is very confusing. I think we need to work on the wording, but >>> perhaps it's not worth hold up the patch. It seems the simplest, >> >> As I need another round anyway (see below), I'm open for better wordin= g >> suggestions. >=20 > Now that I know what it does, I'll probably write something just as > confusing, but here's a shot: >=20 > Allows userspace to mask PCI INTx interrupts from the assigned > device. The kernel will not deliver INTx interrupts to the > guest between setting and clearing of KVM_ASSIGN_SET_INTX_MASK > via this interface. This enables use of and emulation of PCI > 2.3 INTx disable command register behavior. > =20 > This may be used for both PCI 2.3 devices supporting INTx > disable natively and older devices lacking this support. > Userspace is responsible for emulating the read value of the > INTx disable bit in the guest visible PCI command register. > When modifying the INTx disable state, userspace should precede= > updating the physical device command register by calling this > ioctl to inform the kernel of the new intended INTx mask state.= > =20 > Note that the kernel uses the device INTx disable bit to > internally manage the device interrupt state for PCI 2.3 > devices. Reads of this register may therefore not match the > expected value. Writes should always use the guest intended > INTx disable value rather than attempting to read-copy-update > the current physical device state. Races between user and > kernel updates to the INTx disable bit are handled lazily in th= e > kernel. It's possible the device may generate unintended > interrupts, but they will not be injected into the guest. Sounds good, will pick it up. =2E.. >>>> @@ -759,6 +851,56 @@ msix_entry_out: >>>> } >>>> #endif >>>> =20 >>>> +static int kvm_vm_ioctl_set_pci_irq_mask(struct kvm *kvm, >>>> + struct kvm_assigned_pci_dev *assigned_dev) >>>> +{ >>>> + int r =3D 0; >>>> + struct kvm_assigned_dev_kernel *match; >>>> + >>>> + mutex_lock(&kvm->lock); >>>> + >>>> + match =3D kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, >>>> + assigned_dev->assigned_dev_id); >>>> + if (!match) { >>>> + r =3D -ENODEV; >>>> + goto out; >>>> + } >>>> + >>>> + mutex_lock(&match->intx_mask_lock); >>>> + >>>> + match->flags &=3D ~KVM_DEV_ASSIGN_MASK_INTX; >>>> + match->flags |=3D assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX; >>>> + >>>> + if (match->irq_requested_type & KVM_DEV_IRQ_GUEST_INTX) { >>>> + if (assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX) { >>>> + kvm_set_irq(match->kvm, match->irq_source_id, >>>> + match->guest_irq, 0); >>>> + /* >>>> + * Masking at hardware-level is performed on demand, >>>> + * i.e. when an IRQ actually arrives at the host. >>>> + */ >>>> + } else { >>>> + /* >>>> + * Unmask the IRQ line. It may have been masked >>>> + * meanwhile if we aren't using PCI 2.3 INTx masking >>>> + * on the host side. >>>> + */ >>>> + spin_lock_irq(&match->intx_lock); >>>> + if (match->host_irq_disabled) { >>>> + enable_irq(match->host_irq); >>> >>> How do we not get an unbalanced enable here for PCI 2.3 devices? >> >> By performing both the disable and the host_irq_disabled update under >> intx_lock? Or which scenario do you see? >=20 > Do we ever do disable_irq() on a PCI 2.3 device? Seems like we only us= e > the intx API for them, and disable/enable_irq exclusively on non-2.3, s= o > if we can get here on a 2.3 device we have unbalanced enables. Am I > missing some nuance of host_irq_disabled that prevents 2.3 from getting= > here? Thanks, OK, now I got it. Yes, that's indeed a bug. I dug in an older version of this patch, and there I had multiple state values to encode if the line or the device was disabled. Will limit to pre-2.3 devices. Thanks, Jan --------------enig02698C98A630F7DC2A73F7AA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9MKrUACgkQitSsb3rl5xTfWgCgma6ITm2rj5cRzRrd8VI2Fud0 /58AnjS5DEdrDZge4RckU+eICGJhxstI =JuxV -----END PGP SIGNATURE----- --------------enig02698C98A630F7DC2A73F7AA--