From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:34121) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RGHv2-0002OW-4W for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RGHv0-0006jK-9k for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:55 -0400 Received: from fmmailgate02.web.de ([217.72.192.227]:50276) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RGHuz-0006iy-Qk for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:54 -0400 Received: from moweb002.kundenserver.de (moweb002.kundenserver.de [172.19.20.108]) by fmmailgate02.web.de (Postfix) with ESMTP id F16461B5274D1 for ; Wed, 19 Oct 2011 00:13:51 +0200 (CEST) Message-ID: <4E9DFA1D.4050905@web.de> Date: Wed, 19 Oct 2011 00:13:49 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <4E9D831E.100@siemens.com> <20111018140156.GA4980@redhat.com> <4E9D886E.3090201@siemens.com> <20111018150834.GA6103@redhat.com> <4E9D99BE.2030600@siemens.com> <4E9DA18A.8040001@siemens.com> <20111018170640.GB6362@redhat.com> <4E9DC467.3010506@web.de> <20111018184024.GB8322@redhat.com> <4E9DD56A.2010704@web.de> <20111018214006.GB7216@redhat.com> In-Reply-To: <20111018214006.GB7216@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig5DA47125604E9DE77B3CBA90" Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Alex Williamson , Marcelo Tosatti , Avi Kivity , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig5DA47125604E9DE77B3CBA90 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 2011-10-18 23:40, Michael S. Tsirkin wrote: > On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote: >> On 2011-10-18 20:40, Michael S. Tsirkin wrote: >>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote: >>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote: >>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote: >>>>>> On 2011-10-18 17:22, Jan Kiszka wrote: >>>>>>> What KVM has to do is just mapping an arbitrary MSI message >>>>>>> (theoretically 64+32 bits, in practice it's much of course much l= ess) to >>>>>> >>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but t= hat's >>>>>> only a current interpretation of one specific arch. ) >>>>> >>>>> Confused. vector mask is 8 bits. the rest is destination id etc. >>>> >>>> Right, but those additional bits like the destination make different= >>>> messages. We have to encode those 24 bits into a unique GSI number a= nd >>>> restore them (by table lookup) on APIC injection inside the kernel. = If >>>> we only had to encode 256 different vectors, we would be done alread= y. >>> >>> Right. But in practice guests always use distinct vectors (from the >>> 256 available) for distinct messages. This is because >>> the vector seems to be the only thing that gets communicated by the A= PIC >>> to the software. >>> >>> So e.g. a table with 256 entries, with extra 1024-256 >>> used for spill-over for guests that do something unexpected, >>> would work really well. >> >> Already Linux manages vectors on a pre-CPU basis. For efficiency >> reasons, it does not exploit the full range of 256 vectors but actuall= y >> allocates them in - IIRC - steps of 16. So I would not be surprised to= >> find lots of vector number "collisions" when looking over a full set o= f >> CPUs in a system. >> >> Really, these considerations do not help us. We must store all 96 bits= , >> already for the sake of other KVM architectures that want MSI routing.= >>> >>> >>>>> >>>>>>> a single GSI and vice versa. As there are less GSIs than possible= MSI >>>>>>> messages, we could run out of them when creating routes, statical= ly or >>>>>>> lazily. >>>>>>> >>>>>>> What would probably help us long-term out of your concerns regard= ing >>>>>>> lazy routing is to bypass that redundant GSI translation for dyna= mic >>>>>>> messages, i.e. those that are not associated with an irqfd number= or an >>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that = accepts >>>>>>> address and data directly. >>>>>> >>>>>> This would be a trivial extension in fact. Given its beneficial im= pact >>>>>> on our GSI limitation issue, I think I will hack up something like= that. >>>>>> >>>>>> And maybe this makes a transparent cache more reasonable. Then onl= y old >>>>>> host kernels would force us to do searches for already cached mess= ages. >>>>>> >>>>>> Jan >>>>> >>>>> Hmm, I'm not all that sure. Existing design really allows >>>>> caching the route in various smart ways. We currently do >>>>> this for irqfd but this can be extended to ioctls. >>>>> If we just let the guest inject arbitrary messages, >>>>> that becomes much more complex. >>>> >>>> irqfd and kvm device assignment do not allow us to inject arbitrary >>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set a= nd >>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static >>>> routes from an MSI message to a GSI number (+they configure the rela= ted >>>> backends). >>> >>> Yes, it's a very flexible API but it would be very hard to optimize. >>> GSIs let us do the slow path setup, but they make it easy >>> to optimize target lookup in kernel. >> >> Users of the API above have no need to know anything about GSIs. They >> are an artifact of the KVM-internal interface between user space and >> kernel now - thanks to the MSIRoutingCache encapsulation. >=20 > Yes but I am saying that the API above can't be implemented > more efficiently than now: you will have to scan all apics on each MSI.= > The GSI implementation can be optimized: decode the vector once, > if it matches a single vcpu, store that vcpu and use when sending > interrupts. Sorry, missed that you switched to kernel. What information do you want to cache there that cannot be easily obtained by looking at a concrete message? I do not see any. Once you checked that the delivery mode targets a specific cpu, you could address it directly. Or are you thinking about some cluster mode? >=20 >=20 >>> >>> An analogy would be if read/write operated on file paths. >>> fd makes it easy to do permission checks and slow lookups >>> in one place. GSI happens to work like this (maybe, by accident). >> >> Think of an opaque file handle as a MSIRoutingCache object. And it >> encodes not only the routing handle but also other useful associated >> information we need from time to time - internally, not in the device >> models. >=20 > Forget qemu abstractions, I am talking about data path > optimizations in kernel in kvm. From that POV the point of an fd is not= > that it is opaque. It is that it's an index in an array that > can be used for fast lookups. >=20 >>>>> >>>>> Another concern is mask bit emulation. We currently >>>>> handle mask bit in userspace but patches >>>>> to do them in kernel for assigned devices where seen >>>>> and IMO we might want to do that for virtio as well. >>>>> >>>>> For that to work the mask bit needs to be tied to >>>>> a specific gsi or specific device, which does not >>>>> work if we just inject arbitrary writes. >>>> >>>> Yes, but I do not see those valuable plans being negatively affected= =2E >>>> >>>> Jan >>>> >>> >>> I do. >>> How would we maintain a mask/pending bit in kernel if we are not >>> supplied info on all available vectors even? >> >> It's tricky to discuss an undefined interface (there only exists an >> outdated proposal for kvm device assignment). But I suppose that user >> space will have to define the maximum number of vectors when creating = an >> in-kernel MSI-X MMIO area. The device already has to tell this to msix= _init. >> >> The number of used vectors will correlate with the number of registere= d >> irqfds (in the case of vhost or vfio, device assignment still has >> SET_MSIX_NR). As kernel space would then be responsible for mask >> processing, user space would keep vectors registered with irqfds, even= >> if they are masked. It could just continue to play the trick and drop >> data=3D0 vectors. >=20 > Which trick? We don't play any tricks except for device assignment. >=20 >> The point here is: All those steps have _nothing_ to do with the gener= ic >> MSI-X core. They are KVM-specific "side channels" for which KVM provid= es >> an API. In contrast, msix_vector_use/unuse were generic services that >> were actually created to please KVM requirements. But if we split that= >> up, we can address the generic MSI-X requirements in a way that makes >> more sense for emulated devices (and particularly msix_vector_use make= s >> no sense for emulation). >> >> Jan >> >=20 > We need at least msix_vector_unuse Not at all. We rather need some qemu_irq_set(level) for MSI. The spec requires that the device clears pending when the reason for that is removed. And any removal that is device model-originated should simply be signaled like an irq de-assert. Vector "unusage" is just one reason he= re. > - IMO it makes more sense than "clear > pending vector". msix_vector_use is good to keep around for symmetry: > who knows whether we'll need to allocate resources per vector > in the future. For MSI[-X], the spec is already there, and we know that there no need for further resources when emulating it. Only KVM has special needs. Jan --------------enig5DA47125604E9DE77B3CBA90 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6d+h4ACgkQitSsb3rl5xQMKgCcDkRg7uoUpMDqYOO2EKzpNk5K 2zAAoIuvl0fGsbiN9iXx979LMD5oOVSv =QC9s -----END PGP SIGNATURE----- --------------enig5DA47125604E9DE77B3CBA90--