From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:42096)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1RGC0v-0003S6-9B
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1RGC0s-0005pn-AS
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:37 -0400
Received: from mx1.redhat.com ([209.132.183.28]:51794)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1RGC0s-0005pZ-36
	for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:34 -0400
Date: Tue, 18 Oct 2011 17:56:38 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20111018155638.GA6362@redhat.com>
References: <20111018123305.GK28776@redhat.com> <4E9D734C.2060504@siemens.com>
	<20111018124818.GO28776@redhat.com> <4E9D786D.4060802@siemens.com>
	<20111018133719.GS28776@redhat.com> <4E9D831E.100@siemens.com>
	<20111018140156.GA4980@redhat.com> <4E9D886E.3090201@siemens.com>
	<20111018150834.GA6103@redhat.com> <4E9D99BE.2030600@siemens.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E9D99BE.2030600@siemens.com>
Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking
	of used vectors
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Alex Williamson <alex.williamson@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, Avi Kivity <avi@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On Tue, Oct 18, 2011 at 05:22:38PM +0200, Jan Kiszka wrote:
> On 2011-10-18 17:08, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 16:01, Michael S. Tsirkin wrote:
> >>>>>>>>>
> >>>>>>>>> I actually would not mind preallocating everything upfront which is much
> >>>>>>>>> easier.  But with your patch we get a silent failure or a drastic
> >>>>>>>>> slowdown which is much more painful IMO.
> >>>>>>>>
> >>>>>>>> Again: did we already saw that limit? And where does it come from if not
> >>>>>>>> from KVM?
> >>>>>>>
> >>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded
> >>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these.
> >>>>>>
> >>>>>> There should be no such limitation with pseudo GSIs we use for MSI
> >>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved
> >>>>>> vectors) * number-of-cpus (on x86).
> >>>>>
> >>>>> This limits which CPUs can get the interrupt though.
> >>>>> Linux seems to have a global pool as it wants to be able to freely
> >>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :)
> >>>>>
> >>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM,
> >>>>> right?
> >>>>
> >>>> No, our limit we hit with MSI message routing are first of all KVM GSIs,
> >>>> and there only pseudo GSIs that do not go to any interrupt controller
> >>>> with limited pins.
> >>>
> >>> I see KVM_MAX_IRQ_ROUTES 1024
> >>> This is > 256 so KVM does not seem to be the problem.
> >>
> >> We can generate way more different MSI messages than 256. A message may
> >> encode the target CPU, so you have this number in the equation e.g.
> > 
> > Yes but the vector is encoded in 256 bits. The rest is
> > stuff like delivery mode, which won't affect which
> > handler is run AFAIK. So while the problem might
> > appear with vector sharing, in practice there is
> > no vector sharing so no problem :)
> > 
> >>>
> >>>> That could easily be lifted in the kernel if we run
> >>>> into shortages in practice.
> >>>
> >>> What I was saying is that resources are limited even without kvm.
> >>
> >> What other resources related to this particular case are exhausted
> >> before GSI numbers?
> >>
> >> Jan
> > 
> > distinct vectors
> 
> The guest is responsible for managing vectors, not KVM, not QEMU. And
> the guest will notice first when it runs out of them. So a virtio guest
> driver may not even request MSI-X support if that happens.

Absolutely. You can solve the problem from guest in theory.
But what I was saying is, in practice what happens first X
devices get msix, others don't. Guests aren't doing anything smart as
they are not designed with a huge number of devices in mind.

What we can do is solve the problem from management.
And to do that we can't delay allocation until it's used.

> What KVM has to do is just mapping an arbitrary MSI message
> (theoretically 64+32 bits, in practice it's much of course much less) to
> a single GSI and vice versa. As there are less GSIs than possible MSI
> messages, we could run out of them when creating routes, statically or
> lazily.

Possible MSI messages != possible MSI vectors.
If two devices share a vector, APIC won't be able
to distinguish even though e.g. delivery mode is
different.

> What would probably help us long-term out of your concerns regarding
> lazy routing is to bypass that redundant GSI translation for dynamic
> messages, i.e. those that are not associated with an irqfd number or an
> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> address and data directly.
> 
> Jan

You are trying to work around the problem by not requiring
any resources per MSI vector. This just might work for some
uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd).

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux