From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53857) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aPDKd-0004bR-4G for qemu-devel@nongnu.org; Fri, 29 Jan 2016 12:59:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aPDKZ-0004Ud-Mp for qemu-devel@nongnu.org; Fri, 29 Jan 2016 12:59:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47176) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aPDKZ-0004UZ-Bc for qemu-devel@nongnu.org; Fri, 29 Jan 2016 12:59:35 -0500 Message-ID: <1454090373.23148.11.camel@redhat.com> From: Alex Williamson Date: Fri, 29 Jan 2016 10:59:33 -0700 In-Reply-To: <1454051359.28516.28.camel@redhat.com> References: <1451994098-6972-1-git-send-email-kraxel@redhat.com> <1454009759.7183.7.camel@redhat.com> <1454051359.28516.28.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gerd Hoffmann Cc: igvt-g@ml01.01.org, xen-devel@lists.xensource.com, Eduardo Habkost , Stefano Stabellini , qemu-devel@nongnu.org, Cao jin , vfio-users@redhat.com On Fri, 2016-01-29 at 08:09 +0100, Gerd Hoffmann wrote: > =C2=A0 Hi, >=C2=A0 > > 1) The OpRegion MemoryRegion is mapped into system_memory through > > programming of the 0xFC config space register. > > =C2=A0a) vfio-pci could pick an address to do this as it is realized. > > =C2=A0b) SeaBIOS/OVMF could program this. > >=C2=A0 > > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need = to > > pick an address and mark it as e820 reserved.=C2=A0=C2=A0I'm not sure= how to pick > > that address. >=C2=A0 > Because of that I'd let the firmware pick the address and program 0xfc > accordingly, i.e. (b).=C2=A0=C2=A0seabios can simply malloc two pages a= nd be done > with it (any ram allocated by seabios will be tagged as e820 reserved). Thanks for the tip that seabios allocated pages automatically become e820 reserved, that simplifies things a bit. > > 2) Read-only mappings version of 1) > >=C2=A0 > > Discussion: Really nothing changes from the issues above, just preven= ts > > any possibility of the guest modifying anything in the host.=C2=A0=C2= =A0Xen > > apparently allows write access to the host page already. >=C2=A0 > I think read-only is out.=C2=A0=C2=A0Probably xen allows write access b= ecause > guest drivers expect they have write access to the opregion, so the > question is ... >=C2=A0 > > 3) Copy OpRegion contents into buffer and do either 1) or 2) above. >=C2=A0 > whenever we give the guest a copy of the host opregion or direct access= . >=C2=A0 > > 4) Copy contents into a guest RAM location, mark it reserved, point t= o > > it via 0xFC config as scratch register. > > =C2=A0a) Done by QEMU (vfio-pci) > > =C2=A0b) Done by SeaBIOS/OVMF > >=C2=A0 > > Discussion: This is the most like real hardware.=C2=A0=C2=A04.a) has = the usual > > issue of how to pick an address, but the benefit of not requiring BIO= S > > changes (simply mark the RAM reserved via existing methods).=C2=A0=C2= =A04.b) would > > require passing a buffer containing the contents of the OpRegion via > > fw_cfg and letting the BIOS do the setup.=C2=A0=C2=A0The latter of co= urse requires > > modifying each BIOS for this support. >=C2=A0 > Maybe we should define the interface as "guest writes 0xfc to pick > address, qemu takes care to place opregion there".=C2=A0=C2=A0That give= s us the > freedom to change the qemu implementation (either copy host opregion or > map the host opregion) without breaking things. Ok, so seabios allocates two pages, writes the base address of those pages to 0xfc and looks to see whether the signature appears at that address due to qemu mapping.=C2=A0=C2=A0It verifies the size and does a free/realloc if not the right size.=C2=A0=C2=A0If the graphics signature = does not appear, free those pages and assume no opregion support.=C2=A0=C2=A0If we= later decide to use a copy, we'd need to disable the 0xfc automagic mapping and probably pass the data via fw_cfg.=C2=A0=C2=A0Sound right? Do guest drivers depend on IGD appearing at 00:02.0?=C2=A0=C2=A0I'm curre= ntly testing for any Intel VGA device, but I wonder if I should only be enabling anything opregion if it also appears at a specific address. > > Of course none of these support hotplug nor really can they since > > reserved memory regions are not dynamic in the architecture. >=C2=A0 > igd is chipset graphics and therefore not hotpluggable anyway (on > physical hardware), I'd be very surprised if the guest drivers are > prepared to handle hotplug. >=C2=A0 > > Another thing I notice in this series is the access to PCI config spa= ce > > of both the host bridge and the LPC bridge.=C2=A0=C2=A0This prevents = unprivileged > > use cases >=C2=A0 > lpc bridge is no problem, only pci id fields are copied over and > unprivileged access is allowed for them. >=C2=A0 > Copying the gfx registers of the host bridge is a problem indeed. I would argue that both are really a problem, libvirt wants to put QEMU in a container that prevents access to any host system files other than those explicitly allowed.=C2=A0=C2=A0Therefore libvirt needs to grant the= process access to the lpc sysfs config file even though it only needs user visible register values. > > Should vfio add > > additional device specific regions to expose the config space of thes= e > > other devices? >=C2=A0 > That is an option.=C2=A0=C2=A0It is not clear yet which route we have t= o take > though.=C2=A0=C2=A0Testing shows that newer linux drivers work fine eve= n without > igd-passthru=3Don tweaks, whereas older linux kernels and windows drive= rs > don't work even with this series applied and igd-passthru=3Don.=C2=A0=C2= =A0I'll go > look at this as soon as I have test hardware (getting some is wip atm). Ok, well we certainly don't need to necessarily tie config space of those two devices together with opregion access, they can be added later, but we should revisit before we make QEMU grab those config space values itself, if we can make that functionality add value.=C2=A0=C2=A0Th= anks, Alex