From mboxrd@z Thu Jan 1 00:00:00 1970 From: yang.shi@linaro.org (Shi, Yang) Date: Fri, 15 Apr 2016 10:55:14 -0700 Subject: SMMU problem found on LS2085A with 4.6-rc3 In-Reply-To: <57112873.5030903@arm.com> References: <570E9EA9.2070903@linaro.org> <570F8739.1060508@arm.com> <571022AD.1060909@linaro.org> <5710DED6.3010008@arm.com> <571122A3.80909@linaro.org> <57112873.5030903@arm.com> Message-ID: <57112B02.6020402@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 4/15/2016 10:44 AM, Robin Murphy wrote: > On 15/04/16 18:19, Shi, Yang wrote: >> On 4/15/2016 5:30 AM, Robin Murphy wrote: >>> On 15/04/16 00:07, Shi, Yang wrote: >>>> Hi Robin, >>>> >>>> On 4/14/2016 5:04 AM, Robin Murphy wrote: >>>>> Hi Yang, >>>>> >>>>> On 13/04/16 20:31, Shi, Yang wrote: >>>>>> Hi Will & Robin, >>>>>> >>>>>> I just ran some quick test on my LS2085A board, which has 8 Cortex >>>>>> A57 >>>>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >>>>>> >>>>>> SMMU driver reports: >>>>>> >>>>>> arm_smmu_global_fault: 297974 callbacks suppressed >>>>>> arm_smmu_global_fault: 298561 callbacks suppressed >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>> >>>>> That's a stream match conflict fault, so you've somehow got two >>>>> devices >>>>> using the same stream ID attached to different domains, and at least >>>>> one >>>>> of them is trying to do DMA. >>>>> >>>>>> But, it is good with 4.5 kernel. I found the below commit causes it: >>>>>> >>>>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >>>>>> Author: Robin Murphy >>>>>> Date: Tue Jan 26 18:06:36 2016 +0000 >>>>>> >>>>>> iommu/arm-smmu: Support DMA-API domains >>>>>> >>>>>> With DMA mapping ops provided by the iommu-dma code, only a >>>>>> minimal >>>>>> contribution from the IOMMU driver is needed to create a >>>>>> suitable >>>>>> DMA-API domain for them to use. Implement this for the ARM >>>>>> SMMUs. >>>>>> >>>>>> Signed-off-by: Robin Murphy >>>>>> Signed-off-by: Will Deacon >>>>>> >>>>>> Any idea? >>>>> >>>>> My first guess would be the same thing as [1] - does that patch help? >>>> >>>> No, it can't cease the fault. >>>> >>>>> >>>>> Beyond that, what does your DT look like? The one in mainline has one >>>>> token mmu-masters property which isn't even valid, so nothing ever >>>>> gets >>>> >>>> Mine has mmu-masters property too, but removing it doesn't solve the >>>> problem. >>> >>> OK, now things really stop making sense. Without the mmu-masters >>> property the SMMU driver will do nothing but probe the SMMU device >>> itself. Therefore I can only assume the bootloader magic for the >>> Freescale vendor kernel must be rewriting your DT (our board always just >>> says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline >>> DT containing the SMMU, so I'm not sure exactly what it's looking for). >>> Can you see what it's done via /sys/fimware/fdt (or >>> /sys/firmware/devicetree/base/ if you can face hunting down phandles >>> manually)? As a further sanity check, what do you see in >>> /sys/kernel/iommu_groups/*/devices/ and do they differ between the two >>> kernels? >> >> With the mmu-masters property, the fsl-mc will be added into group 2, >> please see the below dmesg log: >> >> iommu: Adding device 3600000.pcie to group 0 >> iommu: Adding device 3700000.pcie to group 1 >> iommu: Adding device 80c000000.fsl-mc to group 2 >> >> fsl-mc won't be there if mmu-masters property is removed. >> >> But, it looks there are multiple devices in iommu_groups 3: >> >> root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0 >> 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/ >> >> It is group 2 if mmu-masters property is removed. >> >> I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and >> 0000:01:00.1 is for the NIC. >> >> Is this behavior expected? > > Yup - since the root complex doesn't support ACS, the IOMMU API puts all > the devices behind it (in this case the NIC and the bridge itself) in > the same group, because otherwise it might be possible for two devices > assigned to different guests to DMA directly to each other without going > through the IOMMU. > >>> Secondly, the stream match conflict can only occur if the SMRs are >>> actually programmed. Since with Will's fix for the conflicts Eric saw we >>> should attach to the default domain without touching the initial bypass >>> entries in the SMRs, I'm at a loss to see how you could still get into >>> this state with that patch applied. >> >> I mixed up the patch, with Will's fix applied, the issue is gone away. > > Phew, that's a relief! Would you be happy to give a Tested-by on that > patch? Sure, just added by Tested-by to that patch. Thanks for your help. Regards, Yang > > Thanks, > Robin. > >> >> Thanks, >> Yang >> >>> >>> As the transactions provoking the fault are apparently instruction >>> fetches on a 00xx stream ID, which I've not seen before, my first guess >>> would be it's something to do with the management complex (which I can't >>> get to work with the staging driver due to firmware incompatibility), >>> but then looking at the general lack of connection to the DMA API within >>> that driver, maybe not? >>> >>> Robin. >>> >>>> >>>> Thanks, >>>> Yang >>>> >>>> >>>>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A >>>>> earlier this week - so there's clearly something going on there. >>>>> >>>>> More generally, I'd note that the mmu-masters binding will never fully >>>>> work on this board - you can get the platform devices to cooperate by >>>>> programming the assorted ICID registers to ensure they present unique >>>>> stream IDs, but PCI devices cannot work at all because there's no >>>>> way to >>>>> make the stream IDs coming out of the root complex be equal to the PCI >>>>> RID in the way it relies on. In that sense, any regression here is >>>>> quite >>>>> likely just a shift from "subtly not working" to "loudly and >>>>> obnoxiously >>>>> not working". Conversely, those reasons have also proved it a really >>>>> useful platform for implementing and testing the iommu-map binding[2] >>>>> (with an awful hack in the PCI driver to program the lookup table >>>>> suitably) :D >>>>> >>>>> Robin. >>>>> >>>>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 >>>>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 >>>>> >>>>>> >>>>>> Thanks, >>>>>> Yang >>>>>> >>>>> >>>> >>> >> >