From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sunset.davemloft.net (unknown [74.93.104.97]) by ozlabs.org (Postfix) with ESMTP id 09837DDDEE for ; Mon, 29 Jan 2007 15:19:45 +1100 (EST) Date: Sun, 28 Jan 2007 20:19:38 -0800 (PST) Message-Id: <20070128.201938.102578509.davem@davemloft.net> To: benh@kernel.crashing.org Subject: Re: [RFC/PATCH 0/16] Ops based MSI Implementation From: David Miller In-Reply-To: <1170040622.26655.187.camel@localhost.localdomain> References: <1170032301.26655.140.camel@localhost.localdomain> <20070128.171309.11624572.davem@davemloft.net> <1170040622.26655.187.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Cc: greg@kroah.com, kyle@parisc-linux.org, linuxppc-dev@ozlabs.org, brice@myri.com, shaohua.li@intel.com, linux-pci@atrey.karlin.mff.cuni.cz, ebiederm@xmission.com List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Benjamin Herrenschmidt Date: Mon, 29 Jan 2007 14:17:02 +1100 > However, your vector space is per-bus (which is good), so you do need to > allocate linux virtual irqs and map them to the actual MSI vectors like > we do on powerpc. Yes, I already use virtual irqs on powerpc so it'll be easy to implement. Those "devino" numbers are used with "device numbers" to create system interrupt numbers, and I'd point the virtual IRQ at that. > I think Eric's framework would work for you. As long as you don't need > to do something special for MSI-X, which I don't think you do... That's my current understanding as well. > - Try to force our stuff in by implementing x86 completely (and Altix) > under Michael's infrastructure and then try to convince > Andrew/Greg/Linus to take it. Fairly unlikely. We do have a somewhat > "gradual" approach to it which consist of having Michael's code at the > toplevel, Eric's code hooked in as if it was a hypervisor, and then > gradually "merge" the raw backend with the x86 code, but it doesn't seem > very sexy (to me neither). Well unless you have a working alternative for x86/ia64/etc folk you have no alternative to Eric's patches to offer for consideration. I think in the future we'll see more stuff like RTAS, it's the only way outside of hardware filtering in the PCI-E bridges to provide real isolation between PCI devices that get divided into different logical domains. And full isolation is absolutely required for proper virtualization. I think Eric really needs to consider the problem of logical domains, and what the problem is which the RTAS folks are trying to resolve. You can't just say something sucks without providing a resaonable alternative suggestion. Eric isn't responding to any of my emails on this matter, and that is not helping at all. If he would, on the other hand, make constructive suggestions of how to implement isolation between independant PCI devices on the same PCI bus which belong to different logical domains, accounting for MSI, we could actually have a real conversation. You can't implement isolation unless you 1) strictly control what devices can do to other devices on the PCI domain or 2) filter transactions in the PCI bridges so that PCI devices cannot send arbitrary junk to each other. #2 is prohibitively expensive and complicated because it requires specialized hardware. #1 is low cost in that all you need to do is make PCI config space accesses and MSI setup go through the hypervisor. That's why systems implement #1 to give full isolation. That's why I think the whole MSI hypervisor thing done by RTAS is absolutely reasonable and something we should support. It's NOT like TCP Offload Engines and the like, not at all, and it's quite upsetting to see Eric characterize it in that way. It's a protection and isolation facility, not a way to hide hardware behind binary blobs.