On 04/11/2013 02:13 AM, Jan Beulich wrote: >>>> On 11.04.13 at 03:51, Suravee Suthikulpanit wrote: >> - From drivers/passthrough/amd/iommu_init.c: enable_iommu(), the >> enabling code tries to set MSI affinity for IOMMU. >> - From xen/arch/x86/msi.c: set_msi_affinity(), the code tries to >> "read_msi_msg()" which eventually calls "amd_iommu_read_msi_from_ire()". >> - Please note that the "amd_iommu_read_msi_from_ire()" was originally >> empty prior the patch. >> - From the patch, the new code try to call "get_intremap_entry()", which >> is the function that was asserting due to table is NULL. > But even before the patch update_intremap_entry_from_msi_msg() > also necessarily called get_intremap_entry() for the IOMMU device > itself. And the respective write_msi_msg() call follows right after the > read_msi_msg() one, i.e. there's no room for other setup to be done > by that point. Plus the assertion was there already. > >> Here, the MSI read message was sent to read IOMMU registers. Since the >> message is targeting for IOMMU itself (bdf 0x2 in this case), it should >> not be remapped. That is why I believe the interrupt remapping table is >> empty. > Whether or not the MSI for the IOMMU device itself needs to be > remapped is an independent question, but judging from the > current code as well as from the specification not saying otherwise > it needs to be. Confirmation on this would surely be appreciated > (because otherwise (a) I'd need to understand how this special > case works with the code before the patch and (b) new special > casing may need to be added). > > In any case - I think I said this before already - seeing a full log up > to the crash with "iommu=debug" in place might help. After all the > ACPI tables ought to mention the IOMMU PCI device in some way, > and hence there ought to be an IVRS mapping as well as an > associated intremap table. Or else the only way this can work > currently would be by way of find_iommu_for_device() returning > NULL for the IOMMU PCI device (which would mean a similar check > would need to be added to amd_iommu_read_msi_from_ire(), but > it would then be bogus for the failure there to cause a log message > to be issued if iommu_debug - that should be trivial to verify on > an unpatched or older [say 4.2.x based] hypervisor passing > "iommu=debug"). > > Jan > I have attached the xl dmesg output from the machine which run the non-patched Xen (w/ iommu=debug). It is showing that there is no IVHD entry setting up for the bdf 0x2 (the IOMMU). Also, there is a line "AMD-Vi: Fail to find iommu for MSI device id = 0x2) which can be traced to xen/drivers/passthrough/amd/iommu_intr.c: amd_iommu_msi_msg_update_ire() which is called from xen/arch/x86/msi.c: write_msi_msg(). When comparing with other AMD systems in the past (i.e. with SR56XX chipset), the IVHD contains IVHD entries such as below: (XEN) AMD-Vi: IVHD Device Entry: type 0x3 id 0 flags 0 (XEN) AMD-Vi: Dev_Id Range: 0 -> 0x2 which will generate the mapping entry for bfd 0x0, 0x1, 0x2. Suravee PS: I could not get the serial console out from the trouble target system (Trinity based), so I could not get the crash message. But from the prior email I sent to first report the issue, you can see that it failed during "enable_iommu".