From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors Date: Tue, 18 Oct 2011 17:56:38 +0200 Message-ID: <20111018155638.GA6362@redhat.com> References: <20111018123305.GK28776@redhat.com> <4E9D734C.2060504@siemens.com> <20111018124818.GO28776@redhat.com> <4E9D786D.4060802@siemens.com> <20111018133719.GS28776@redhat.com> <4E9D831E.100@siemens.com> <20111018140156.GA4980@redhat.com> <4E9D886E.3090201@siemens.com> <20111018150834.GA6103@redhat.com> <4E9D99BE.2030600@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Marcelo Tosatti , "kvm@vger.kernel.org" , Alex Williamson , "qemu-devel@nongnu.org" To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:16117 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754949Ab1JRPzg (ORCPT ); Tue, 18 Oct 2011 11:55:36 -0400 Content-Disposition: inline In-Reply-To: <4E9D99BE.2030600@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Oct 18, 2011 at 05:22:38PM +0200, Jan Kiszka wrote: > On 2011-10-18 17:08, Michael S. Tsirkin wrote: > > On Tue, Oct 18, 2011 at 04:08:46PM +0200, Jan Kiszka wrote: > >> On 2011-10-18 16:01, Michael S. Tsirkin wrote: > >>>>>>>>> > >>>>>>>>> I actually would not mind preallocating everything upfront which is much > >>>>>>>>> easier. But with your patch we get a silent failure or a drastic > >>>>>>>>> slowdown which is much more painful IMO. > >>>>>>>> > >>>>>>>> Again: did we already saw that limit? And where does it come from if not > >>>>>>>> from KVM? > >>>>>>> > >>>>>>> It's a hardware limitation of intel APICs. interrupt vector is encoded > >>>>>>> in an 8 bit field in msi address. So you can have at most 256 of these. > >>>>>> > >>>>>> There should be no such limitation with pseudo GSIs we use for MSI > >>>>>> injection. They end up as MSI messages again, so actually 256 (-reserved > >>>>>> vectors) * number-of-cpus (on x86). > >>>>> > >>>>> This limits which CPUs can get the interrupt though. > >>>>> Linux seems to have a global pool as it wants to be able to freely > >>>>> balance vectors between CPUs. Or, consider a guest with a single CPU :) > >>>>> > >>>>> Anyway, why argue - there is a limitation, and it's not coming from KVM, > >>>>> right? > >>>> > >>>> No, our limit we hit with MSI message routing are first of all KVM GSIs, > >>>> and there only pseudo GSIs that do not go to any interrupt controller > >>>> with limited pins. > >>> > >>> I see KVM_MAX_IRQ_ROUTES 1024 > >>> This is > 256 so KVM does not seem to be the problem. > >> > >> We can generate way more different MSI messages than 256. A message may > >> encode the target CPU, so you have this number in the equation e.g. > > > > Yes but the vector is encoded in 256 bits. The rest is > > stuff like delivery mode, which won't affect which > > handler is run AFAIK. So while the problem might > > appear with vector sharing, in practice there is > > no vector sharing so no problem :) > > > >>> > >>>> That could easily be lifted in the kernel if we run > >>>> into shortages in practice. > >>> > >>> What I was saying is that resources are limited even without kvm. > >> > >> What other resources related to this particular case are exhausted > >> before GSI numbers? > >> > >> Jan > > > > distinct vectors > > The guest is responsible for managing vectors, not KVM, not QEMU. And > the guest will notice first when it runs out of them. So a virtio guest > driver may not even request MSI-X support if that happens. Absolutely. You can solve the problem from guest in theory. But what I was saying is, in practice what happens first X devices get msix, others don't. Guests aren't doing anything smart as they are not designed with a huge number of devices in mind. What we can do is solve the problem from management. And to do that we can't delay allocation until it's used. > What KVM has to do is just mapping an arbitrary MSI message > (theoretically 64+32 bits, in practice it's much of course much less) to > a single GSI and vice versa. As there are less GSIs than possible MSI > messages, we could run out of them when creating routes, statically or > lazily. Possible MSI messages != possible MSI vectors. If two devices share a vector, APIC won't be able to distinguish even though e.g. delivery mode is different. > What would probably help us long-term out of your concerns regarding > lazy routing is to bypass that redundant GSI translation for dynamic > messages, i.e. those that are not associated with an irqfd number or an > assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts > address and data directly. > > Jan You are trying to work around the problem by not requiring any resources per MSI vector. This just might work for some uses (ioctl) but isn't a generic solution (e.g. won't work for irqfd). > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux