From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:34121)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1RGHv2-0002OW-4W
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1RGHv0-0006jK-9k
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:55 -0400
Received: from fmmailgate02.web.de ([217.72.192.227]:50276)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1RGHuz-0006iy-Qk
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 18:13:54 -0400
Received: from moweb002.kundenserver.de (moweb002.kundenserver.de
	[172.19.20.108])
	by fmmailgate02.web.de (Postfix) with ESMTP id F16461B5274D1
	for <qemu-devel@nongnu.org>; Wed, 19 Oct 2011 00:13:51 +0200 (CEST)
Message-ID: <4E9DFA1D.4050905@web.de>
Date: Wed, 19 Oct 2011 00:13:49 +0200
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
References: <4E9D831E.100@siemens.com> <20111018140156.GA4980@redhat.com>
	<4E9D886E.3090201@siemens.com> <20111018150834.GA6103@redhat.com>
	<4E9D99BE.2030600@siemens.com> <4E9DA18A.8040001@siemens.com>
	<20111018170640.GB6362@redhat.com> <4E9DC467.3010506@web.de>
	<20111018184024.GB8322@redhat.com> <4E9DD56A.2010704@web.de>
	<20111018214006.GB7216@redhat.com>
In-Reply-To: <20111018214006.GB7216@redhat.com>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig5DA47125604E9DE77B3CBA90"
Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking
	of used vectors
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, Avi Kivity <avi@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig5DA47125604E9DE77B3CBA90
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>> (theoretically 64+32 bits, in practice it's much of course much l=
ess) to
>>>>>>
>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but t=
hat's
>>>>>> only a current interpretation of one specific arch. )
>>>>>
>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>
>>>> Right, but those additional bits like the destination make different=

>>>> messages. We have to encode those 24 bits into a unique GSI number a=
nd
>>>> restore them (by table lookup) on APIC injection inside the kernel. =
If
>>>> we only had to encode 256 different vectors, we would be done alread=
y.
>>>
>>> Right. But in practice guests always use distinct vectors (from the
>>> 256 available) for distinct messages. This is because
>>> the vector seems to be the only thing that gets communicated by the A=
PIC
>>> to the software.
>>>
>>> So e.g. a table with 256 entries, with extra 1024-256
>>> used for spill-over for guests that do something unexpected,
>>> would work really well.
>>
>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>> reasons, it does not exploit the full range of 256 vectors but actuall=
y
>> allocates them in - IIRC - steps of 16. So I would not be surprised to=

>> find lots of vector number "collisions" when looking over a full set o=
f
>> CPUs in a system.
>>
>> Really, these considerations do not help us. We must store all 96 bits=
,
>> already for the sake of other KVM architectures that want MSI routing.=

>>>
>>>
>>>>>
>>>>>>> a single GSI and vice versa. As there are less GSIs than possible=
 MSI
>>>>>>> messages, we could run out of them when creating routes, statical=
ly or
>>>>>>> lazily.
>>>>>>>
>>>>>>> What would probably help us long-term out of your concerns regard=
ing
>>>>>>> lazy routing is to bypass that redundant GSI translation for dyna=
mic
>>>>>>> messages, i.e. those that are not associated with an irqfd number=
 or an
>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that =
accepts
>>>>>>> address and data directly.
>>>>>>
>>>>>> This would be a trivial extension in fact. Given its beneficial im=
pact
>>>>>> on our GSI limitation issue, I think I will hack up something like=
 that.
>>>>>>
>>>>>> And maybe this makes a transparent cache more reasonable. Then onl=
y old
>>>>>> host kernels would force us to do searches for already cached mess=
ages.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>> caching the route in various smart ways. We currently do
>>>>> this for irqfd but this can be extended to ioctls.
>>>>> If we just let the guest inject arbitrary messages,
>>>>> that becomes much more complex.
>>>>
>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set a=
nd
>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>> routes from an MSI message to a GSI number (+they configure the rela=
ted
>>>> backends).
>>>
>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>> GSIs let us do the slow path setup, but they make it easy
>>> to optimize target lookup in kernel.
>>
>> Users of the API above have no need to know anything about GSIs. They
>> are an artifact of the KVM-internal interface between user space and
>> kernel now - thanks to the MSIRoutingCache encapsulation.
>=20
> Yes but I am saying that the API above can't be implemented
> more efficiently than now: you will have to scan all apics on each MSI.=

> The GSI implementation can be optimized: decode the vector once,
> if it matches a single vcpu, store that vcpu and use when sending
> interrupts.

Sorry, missed that you switched to kernel.

What information do you want to cache there that cannot be easily
obtained by looking at a concrete message? I do not see any. Once you
checked that the delivery mode targets a specific cpu, you could address
it directly. Or are you thinking about some cluster mode?

>=20
>=20
>>>
>>> An analogy would be if read/write operated on file paths.
>>> fd makes it easy to do permission checks and slow lookups
>>> in one place. GSI happens to work like this (maybe, by accident).
>>
>> Think of an opaque file handle as a MSIRoutingCache object. And it
>> encodes not only the routing handle but also other useful associated
>> information we need from time to time - internally, not in the device
>> models.
>=20
> Forget qemu abstractions, I am talking about data path
> optimizations in kernel in kvm. From that POV the point of an fd is not=

> that it is opaque. It is that it's an index in an array that
> can be used for fast lookups.
>=20
>>>>>
>>>>> Another concern is mask bit emulation. We currently
>>>>> handle mask bit in userspace but patches
>>>>> to do them in kernel for assigned devices where seen
>>>>> and IMO we might want to do that for virtio as well.
>>>>>
>>>>> For that to work the mask bit needs to be tied to
>>>>> a specific gsi or specific device, which does not
>>>>> work if we just inject arbitrary writes.
>>>>
>>>> Yes, but I do not see those valuable plans being negatively affected=
=2E
>>>>
>>>> Jan
>>>>
>>>
>>> I do.
>>> How would we maintain a mask/pending bit in kernel if we are not
>>> supplied info on all available vectors even?
>>
>> It's tricky to discuss an undefined interface (there only exists an
>> outdated proposal for kvm device assignment). But I suppose that user
>> space will have to define the maximum number of vectors when creating =
an
>> in-kernel MSI-X MMIO area. The device already has to tell this to msix=
_init.
>>
>> The number of used vectors will correlate with the number of registere=
d
>> irqfds (in the case of vhost or vfio, device assignment still has
>> SET_MSIX_NR). As kernel space would then be responsible for mask
>> processing, user space would keep vectors registered with irqfds, even=

>> if they are masked. It could just continue to play the trick and drop
>> data=3D0 vectors.
>=20
> Which trick?  We don't play any tricks except for device assignment.
>=20
>> The point here is: All those steps have _nothing_ to do with the gener=
ic
>> MSI-X core. They are KVM-specific "side channels" for which KVM provid=
es
>> an API. In contrast, msix_vector_use/unuse were generic services that
>> were actually created to please KVM requirements. But if we split that=

>> up, we can address the generic MSI-X requirements in a way that makes
>> more sense for emulated devices (and particularly msix_vector_use make=
s
>> no sense for emulation).
>>
>> Jan
>>
>=20
> We need at least msix_vector_unuse

Not at all. We rather need some qemu_irq_set(level) for MSI. The spec
requires that the device clears pending when the reason for that is
removed. And any removal that is device model-originated should simply
be signaled like an irq de-assert. Vector "unusage" is just one reason he=
re.

> - IMO it makes more sense than "clear
> pending vector". msix_vector_use is good to keep around for symmetry:
> who knows whether we'll need to allocate resources per vector
> in the future.

For MSI[-X], the spec is already there, and we know that there no need
for further resources when emulating it. Only KVM has special needs.

Jan


--------------enig5DA47125604E9DE77B3CBA90
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6d+h4ACgkQitSsb3rl5xQMKgCcDkRg7uoUpMDqYOO2EKzpNk5K
2zAAoIuvl0fGsbiN9iXx979LMD5oOVSv
=QC9s
-----END PGP SIGNATURE-----

--------------enig5DA47125604E9DE77B3CBA90--