From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:42096) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RGC0v-0003S6-9B for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RGC0s-0005pn-AS for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51794) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RGC0s-0005pZ-36 for qemu-devel@nongnu.org; Tue, 18 Oct 2011 11:55:34 -0400 Date: Tue, 18 Oct 2011 17:56:38 +0200 From: "Michael S. Tsirkin" Message-ID: <20111018155638.GA6362@redhat.com> References: <20111018123305.GK28776@redhat.com> <4E9D734C.2060504@siemens.com> <20111018124818.GO28776@redhat.com> <4E9D786D.4060802@siemens.com> <20111018133719.GS28776@redhat.com> <4E9D831E.100@siemens.com> <20111018140156.GA4980@redhat.com> <4E9D886E.3090201@siemens.com> <20111018150834.GA6103@redhat.com> <4E9D99BE.2030600@siemens.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E9D99BE.2030600@siemens.com> Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Alex Williamson , Marcelo Tosatti , Avi Kivity , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" On Tue, Oct 18, 2011 at 05:22:38PM +0200, Jan Kiszka wrote: > On 2011-10-18 17:08, Michael S. Tsirkin wrote: > > On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote: > >> On 2011-10-18 16:01, Michael S. Tsirkin wrote: > >>>>>>>>> > >>>>>>>>> I actually would not mind preallocating everything upfront which is much > >>>>>>>>> easier. But with your patch we get a silent failure or a drastic > >>>>>>>>> slowdown which is much more painful IMO. > >>>>>>>> > >>>>>>>> Again: did we already saw that limit? And where does it come from if not > >>>>>>>> from KVM? > >>>>>>> > >>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded > >>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these. > >>>>>> > >>>>>> There should be no such limitation with pseudo GSIs we use for MSI > >>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved > >>>>>> vectors) * number-of-cpus (on x86). > >>>>> > >>>>> This limits which CPUs can get the interrupt though. > >>>>> Linux seems to have a global pool as it wants to be able to freely > >>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :) > >>>>> > >>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM, > >>>>> right? > >>>> > >>>> No, our limit we hit with MSI message routing are first of all KVM GSIs, > >>>> and there only pseudo GSIs that do not go to any interrupt controller > >>>> with limited pins. > >>> > >>> I see KVM_MAX_IRQ_ROUTES 1024 > >>> This is > 256 so KVM does not seem to be the problem. > >> > >> We can generate way more different MSI messages than 256. A message may > >> encode the target CPU, so you have this number in the equation e.g. > > > > Yes but the vector is encoded in 256 bits. The rest is > > stuff like delivery mode, which won't affect which > > handler is run AFAIK. So while the problem might > > appear with vector sharing, in practice there is > > no vector sharing so no problem :) > > > >>> > >>>> That could easily be lifted in the kernel if we run > >>>> into shortages in practice. > >>> > >>> What I was saying is that resources are limited even without kvm. > >> > >> What other resources related to this particular case are exhausted > >> before GSI numbers? > >> > >> Jan > > > > distinct vectors > > The guest is responsible for managing vectors, not KVM, not QEMU. And > the guest will notice first when it runs out of them. So a virtio guest > driver may not even request MSI-X support if that happens. Absolutely. You can solve the problem from guest in theory. But what I was saying is, in practice what happens first X devices get msix, others don't. Guests aren't doing anything smart as they are not designed with a huge number of devices in mind. What we can do is solve the problem from management. And to do that we can't delay allocation until it's used. > What KVM has to do is just mapping an arbitrary MSI message > (theoretically 64+32 bits, in practice it's much of course much less) to > a single GSI and vice versa. As there are less GSIs than possible MSI > messages, we could run out of them when creating routes, statically or > lazily. Possible MSI messages != possible MSI vectors. If two devices share a vector, APIC won't be able to distinguish even though e.g. delivery mode is different. > What would probably help us long-term out of your concerns regarding > lazy routing is to bypass that redundant GSI translation for dynamic > messages, i.e. those that are not associated with an irqfd number or an > assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts > address and data directly. > > Jan You are trying to work around the problem by not requiring any resources per MSI vector. This just might work for some uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd). > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux