From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJS1N-0005J1-BE for qemu-devel@nongnu.org; Mon, 18 Aug 2014 14:51:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XJS1I-0001NI-Gq for qemu-devel@nongnu.org; Mon, 18 Aug 2014 14:51:09 -0400 Received: from mout.web.de ([212.227.17.12]:60147) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJS1I-0001N4-74 for qemu-devel@nongnu.org; Mon, 18 Aug 2014 14:51:04 -0400 Message-ID: <53F24B0F.1070805@web.de> Date: Mon, 18 Aug 2014 20:50:55 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <1407740702-4086-1-git-send-email-tamlokveer@gmail.com> <20140814111511.GS31346@redhat.com> <53ECA71F.6070700@web.de> <1408077757.11008.89.camel@ori.omang.mine.nu> <1408101325.14053.41.camel@abi.no.oracle.com> <1408175661.11008.113.camel@ori.omang.mine.nu> <53EF1A2D.4050709@web.de> <53EF1AAB.8050304@web.de> <1408379654.11008.201.camel@ori.omang.mine.nu> In-Reply-To: <1408379654.11008.201.camel@ori.omang.mine.nu> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="G3moOe5j3J4RFssib8J38XI5LcPfDofdA" Subject: Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Knut Omang Cc: "Michael S. Tsirkin" , Stefan Weil , qemu-devel , Le Tan , Alex Williamson , Anthony Liguori , Paolo Bonzini This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --G3moOe5j3J4RFssib8J38XI5LcPfDofdA Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2014-08-18 18:34, Knut Omang wrote: > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote: >> On 2014-08-16 10:45, Jan Kiszka wrote: >>> On 2014-08-16 09:54, Knut Omang wrote: >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote: >>>>> Hi Knut, >>>>> >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang : >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote: >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote: >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote: >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) emu= lation to q35 >>>>>>>>>> chipset. The major job in these patches is to add support for = emulating Intel >>>>>>>>>> IOMMU according to the VT-d specification, including basic res= ponses to CSRs >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory ad= dress >>>>>>>>>> translations. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> Looks very good overall, I noted some coding style issues - I d= idn't >>>>>>>>> bother reporting each issue in every place where it appears - r= eported >>>>>>>>> each issue once only, so please find and fix all instances of e= ach >>>>>>>>> issue. >>>>>>>> >>>>>>>> BTW, because I was in urgent need for virtual test environment f= or >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches: >>>>>>>> >>>>>>>> http://git.kiszka.org/?p=3Dqemu.git;a=3Dshortlog;h=3Drefs/heads/= queues/vtd-intremap >>>>>>>> >>>>>>>> The approach likely needs further discussions and refinements bu= t it >>>>>>>> already allows me to work on top with our hypervisor, and also L= inux. >>>>>>>> You can see from the last commit that Le's work made it pretty e= asy to >>>>>>>> build this on top. >>>>>>> >>>>>>> Le, >>>>>>> >>>>>>> I have tried Jan's branch with my device setup which consists of = a >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device >>>>>>> ioh3420,slot=3D0 ) and a pcie device plugged into that root port,= which >>>>>>> gives the following lscpi -t: >>>>>>> >>>>>>> -[0000:00]-+-00.0 >>>>>>> +-01.0 >>>>>>> +-02.0 >>>>>>> +-03.0-[01]----00.0 >>>>>>> +-04.0 >>>>>>> +-1f.0 >>>>>>> +-1f.2 >>>>>>> \-1f.3 >>>>>>> >>>>>>> All seems to work beautifully (I see the ISA bridge happily recei= ve >>>>>>> translations) until the first DMA from my device model (at 1:00.0= ) >>>>>>> arrives, at which point I get: >>>>>>> >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fa= ult addr fffa0000 >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entr= y is clear >>>>>>> >>>>>>> I would have expected request device 01:00.0 for this. >>>>>>> It is not clear to me yet if this is a weakness of the implementa= tion of >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in c= ase you >>>>>>> can shed some light to it or it is an easy fix, >>>>>>> >>>>>>> The device uses pci_dma_rw with itself as device pointer. >>>>>> >>>>>> To verify my hypothesis: with this rude hack my device now works m= uch >>>>>> better: >>>>>> >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *v= td_as, >>>>>> int bus_num, int devfn, >>>>>> is_fpd_set =3D ce.lo & VTD_CONTEXT_ENTRY_FPD; >>>>>> } else { >>>>>> ret_fr =3D dev_to_context_entry(s, bus_num, devfn, &ce); >>>>>> + if (ret_fr) >>>>>> + ret_fr =3D dev_to_context_entry(s, 1, 0, &ce); >>>>>> is_fpd_set =3D ce.lo & VTD_CONTEXT_ENTRY_FPD; >>>>>> if (ret_fr) { >>>>>> ret_fr =3D -ret_fr; >>>>>> >>>>>> Looking at how things look on hardware, multiple devices often rec= eive >>>>>> overlapping DMA address ranges for different physical addresses. >>>>>> >>>>>> So if I understand the way this works, every requester ID would al= so >>>>>> need to have it's own unique VTDAddressSpace, as each pci >>>>>> device/function sees a unique DMA address space.. >>>>> >>>>> ioh3420 is a pcie-to-pcie bridge, right?=20 >>>> >>>> Yes. >>>> >>>>> In my opinion, each pci-e >>>>> device behind the pcie-to-pcie bridge can be assigned individually.= >>>>> For now I added the VT-d to q35 by just adding it to the root pci b= us. >>>>> You can see here in q35.c: >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu); >>>>> So if we add a pcie-to-pcie bridge, we may have to call the >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into= >>>>> this now. :) If you know the mechanism behind that, you can try to = add >>>>> that for the new bus. (I will dive into this after the clean up.) >>>>> What do you think? >>>> >>>> Thanks for the quick answer, that helped a lot! >>>> >>>> Looking into the details here I realize it is slightly more complica= ted: >>>> secondary buses are enumerated after device instantiation, as part o= f >>>> the host PCI enumeration, so if I add a similar setup call in the br= idge >>>> setup, it will be called for a new device long before it has receive= d >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] ) >>>> >>>> I agree that the lookup function for contexts needs to be as efficie= nt >>>> as possible so the simple lookup key may be the best >>>> solution but then the address_spaces table cannot be populated with = the >>>> secondary bus entries before it receives a nonzero !=3D 255 bus numb= er, >>>> eg. along the lines of this:=20 >>>> >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c >>>> index 4becdc1..d9a8c23 100644 >>>> --- a/hw/pci/pci_bridge.c >>>> +++ b/hw/pci/pci_bridge.c >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d, >>>> pci_bridge_update_mappings(s); >>>> } >>>> =20 >>>> + if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) { >>>> + int bus_num =3D pci_bus_num(&s->sec_bus); >>>> + if (bus_num !=3D 0xff && bus_num !=3D 0x00) >>>> + >>>> + } >>>> + >>>> newctl =3D pci_get_word(d->config + PCI_BRIDGE_CONTROL); >>>> if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) { >>>> /* Trigger hot reset on 0->1 transition. */ >>>> >>>> but it is getting complicated... >>>> Thoughts? >>> >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_= num >>> there? >> >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead = of >> the IntelIOMMUState. >=20 > Thanks - that got me going - after some playing around with the data > structures I ended up with these patches (based on top of Jan's > vtd-intremap branch): Are you depending on interrupt remapping? If not, my patches are a bit hacky and may cause their own issues if you are unlucky. >=20 > https://github.com/knuto/qemu/tree/vtd_patches >=20 > In essence I ended up replacing the address_spaces[] array in > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking > the VTDAddressSpace objects to serve vtd_context_device_invalidate, the= n > use pci_bus_num() whenever a bus number is requires, except for the=20 > "special" bus needed by the interrupt remapping code. >=20 > To achieve this, I had to change the signature of the pci_setup_iommu > function (the first commit). >=20 > With this I am now seeing translations for the device in the virtual > root port, but I think this should work equally well with other PCI > bridges. Good to know that we are on the right path! Maybe this can be integrated in the next posting round so that we do not need any bridge exceptions due to incompatibility. Jan --G3moOe5j3J4RFssib8J38XI5LcPfDofdA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlPySxEACgkQitSsb3rl5xTCZgCg7CC/YwSJg8UvAASfGWUvdt6W brwAnAtRmcX+2+WSn2PrkIVbBfqwtfZL =I12e -----END PGP SIGNATURE----- --G3moOe5j3J4RFssib8J38XI5LcPfDofdA--