* SMMU problem found on LS2085A with 4.6-rc3 @ 2016-04-13 19:31 Shi, Yang 2016-04-14 12:04 ` Robin Murphy 0 siblings, 1 reply; 7+ messages in thread From: Shi, Yang @ 2016-04-13 19:31 UTC (permalink / raw) To: linux-arm-kernel Hi Will & Robin, I just ran some quick test on my LS2085A board, which has 8 Cortex A57 cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. SMMU driver reports: arm_smmu_global_fault: 297974 callbacks suppressed arm_smmu_global_fault: 298561 callbacks suppressed arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, GFSYNR1 0x00000300, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious arm-smmu 5000000.iommu: Unexpected global fault, this could be serious But, it is good with 4.5 kernel. I found the below commit causes it: commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 Author: Robin Murphy <robin.murphy@arm.com> Date: Tue Jan 26 18:06:36 2016 +0000 iommu/arm-smmu: Support DMA-API domains With DMA mapping ops provided by the iommu-dma code, only a minimal contribution from the IOMMU driver is needed to create a suitable DMA-API domain for them to use. Implement this for the ARM SMMUs. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Any idea? Thanks, Yang ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-13 19:31 SMMU problem found on LS2085A with 4.6-rc3 Shi, Yang @ 2016-04-14 12:04 ` Robin Murphy 2016-04-14 23:07 ` Shi, Yang 0 siblings, 1 reply; 7+ messages in thread From: Robin Murphy @ 2016-04-14 12:04 UTC (permalink / raw) To: linux-arm-kernel Hi Yang, On 13/04/16 20:31, Shi, Yang wrote: > Hi Will & Robin, > > I just ran some quick test on my LS2085A board, which has 8 Cortex A57 > cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. > > SMMU driver reports: > > arm_smmu_global_fault: 297974 callbacks suppressed > arm_smmu_global_fault: 298561 callbacks suppressed > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, > GFSYNR1 0x00000300, GFSYNR2 0x00000000 > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > arm-smmu 5000000.iommu: Unexpected global fault, this could be serious That's a stream match conflict fault, so you've somehow got two devices using the same stream ID attached to different domains, and at least one of them is trying to do DMA. > But, it is good with 4.5 kernel. I found the below commit causes it: > > commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 > Author: Robin Murphy <robin.murphy@arm.com> > Date: Tue Jan 26 18:06:36 2016 +0000 > > iommu/arm-smmu: Support DMA-API domains > > With DMA mapping ops provided by the iommu-dma code, only a minimal > contribution from the IOMMU driver is needed to create a suitable > DMA-API domain for them to use. Implement this for the ARM SMMUs. > > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > Signed-off-by: Will Deacon <will.deacon@arm.com> > > Any idea? My first guess would be the same thing as [1] - does that patch help? Beyond that, what does your DT look like? The one in mainline has one token mmu-masters property which isn't even valid, so nothing ever gets attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A earlier this week - so there's clearly something going on there. More generally, I'd note that the mmu-masters binding will never fully work on this board - you can get the platform devices to cooperate by programming the assorted ICID registers to ensure they present unique stream IDs, but PCI devices cannot work at all because there's no way to make the stream IDs coming out of the root complex be equal to the PCI RID in the way it relies on. In that sense, any regression here is quite likely just a shift from "subtly not working" to "loudly and obnoxiously not working". Conversely, those reasons have also proved it a really useful platform for implementing and testing the iommu-map binding[2] (with an awful hack in the PCI driver to program the lookup table suitably) :D Robin. [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 > > Thanks, > Yang > ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-14 12:04 ` Robin Murphy @ 2016-04-14 23:07 ` Shi, Yang 2016-04-15 12:30 ` Robin Murphy 0 siblings, 1 reply; 7+ messages in thread From: Shi, Yang @ 2016-04-14 23:07 UTC (permalink / raw) To: linux-arm-kernel Hi Robin, On 4/14/2016 5:04 AM, Robin Murphy wrote: > Hi Yang, > > On 13/04/16 20:31, Shi, Yang wrote: >> Hi Will & Robin, >> >> I just ran some quick test on my LS2085A board, which has 8 Cortex A57 >> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >> >> SMMU driver reports: >> >> arm_smmu_global_fault: 297974 callbacks suppressed >> arm_smmu_global_fault: 298561 callbacks suppressed >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious > > That's a stream match conflict fault, so you've somehow got two devices > using the same stream ID attached to different domains, and at least one > of them is trying to do DMA. > >> But, it is good with 4.5 kernel. I found the below commit causes it: >> >> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >> Author: Robin Murphy <robin.murphy@arm.com> >> Date: Tue Jan 26 18:06:36 2016 +0000 >> >> iommu/arm-smmu: Support DMA-API domains >> >> With DMA mapping ops provided by the iommu-dma code, only a minimal >> contribution from the IOMMU driver is needed to create a suitable >> DMA-API domain for them to use. Implement this for the ARM SMMUs. >> >> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >> Signed-off-by: Will Deacon <will.deacon@arm.com> >> >> Any idea? > > My first guess would be the same thing as [1] - does that patch help? No, it can't cease the fault. > > Beyond that, what does your DT look like? The one in mainline has one > token mmu-masters property which isn't even valid, so nothing ever gets Mine has mmu-masters property too, but removing it doesn't solve the problem. Thanks, Yang > attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A > earlier this week - so there's clearly something going on there. > > More generally, I'd note that the mmu-masters binding will never fully > work on this board - you can get the platform devices to cooperate by > programming the assorted ICID registers to ensure they present unique > stream IDs, but PCI devices cannot work at all because there's no way to > make the stream IDs coming out of the root complex be equal to the PCI > RID in the way it relies on. In that sense, any regression here is quite > likely just a shift from "subtly not working" to "loudly and obnoxiously > not working". Conversely, those reasons have also proved it a really > useful platform for implementing and testing the iommu-map binding[2] > (with an awful hack in the PCI driver to program the lookup table > suitably) :D > > Robin. > > [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 > [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 > >> >> Thanks, >> Yang >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-14 23:07 ` Shi, Yang @ 2016-04-15 12:30 ` Robin Murphy 2016-04-15 17:19 ` Shi, Yang 0 siblings, 1 reply; 7+ messages in thread From: Robin Murphy @ 2016-04-15 12:30 UTC (permalink / raw) To: linux-arm-kernel On 15/04/16 00:07, Shi, Yang wrote: > Hi Robin, > > On 4/14/2016 5:04 AM, Robin Murphy wrote: >> Hi Yang, >> >> On 13/04/16 20:31, Shi, Yang wrote: >>> Hi Will & Robin, >>> >>> I just ran some quick test on my LS2085A board, which has 8 Cortex A57 >>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >>> >>> SMMU driver reports: >>> >>> arm_smmu_global_fault: 297974 callbacks suppressed >>> arm_smmu_global_fault: 298561 callbacks suppressed >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >> >> That's a stream match conflict fault, so you've somehow got two devices >> using the same stream ID attached to different domains, and at least one >> of them is trying to do DMA. >> >>> But, it is good with 4.5 kernel. I found the below commit causes it: >>> >>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >>> Author: Robin Murphy <robin.murphy@arm.com> >>> Date: Tue Jan 26 18:06:36 2016 +0000 >>> >>> iommu/arm-smmu: Support DMA-API domains >>> >>> With DMA mapping ops provided by the iommu-dma code, only a minimal >>> contribution from the IOMMU driver is needed to create a suitable >>> DMA-API domain for them to use. Implement this for the ARM SMMUs. >>> >>> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >>> Signed-off-by: Will Deacon <will.deacon@arm.com> >>> >>> Any idea? >> >> My first guess would be the same thing as [1] - does that patch help? > > No, it can't cease the fault. > >> >> Beyond that, what does your DT look like? The one in mainline has one >> token mmu-masters property which isn't even valid, so nothing ever gets > > Mine has mmu-masters property too, but removing it doesn't solve the > problem. OK, now things really stop making sense. Without the mmu-masters property the SMMU driver will do nothing but probe the SMMU device itself. Therefore I can only assume the bootloader magic for the Freescale vendor kernel must be rewriting your DT (our board always just says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline DT containing the SMMU, so I'm not sure exactly what it's looking for). Can you see what it's done via /sys/fimware/fdt (or /sys/firmware/devicetree/base/ if you can face hunting down phandles manually)? As a further sanity check, what do you see in /sys/kernel/iommu_groups/*/devices/ and do they differ between the two kernels? Secondly, the stream match conflict can only occur if the SMRs are actually programmed. Since with Will's fix for the conflicts Eric saw we should attach to the default domain without touching the initial bypass entries in the SMRs, I'm at a loss to see how you could still get into this state with that patch applied. As the transactions provoking the fault are apparently instruction fetches on a 00xx stream ID, which I've not seen before, my first guess would be it's something to do with the management complex (which I can't get to work with the staging driver due to firmware incompatibility), but then looking at the general lack of connection to the DMA API within that driver, maybe not? Robin. > > Thanks, > Yang > > >> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A >> earlier this week - so there's clearly something going on there. >> >> More generally, I'd note that the mmu-masters binding will never fully >> work on this board - you can get the platform devices to cooperate by >> programming the assorted ICID registers to ensure they present unique >> stream IDs, but PCI devices cannot work at all because there's no way to >> make the stream IDs coming out of the root complex be equal to the PCI >> RID in the way it relies on. In that sense, any regression here is quite >> likely just a shift from "subtly not working" to "loudly and obnoxiously >> not working". Conversely, those reasons have also proved it a really >> useful platform for implementing and testing the iommu-map binding[2] >> (with an awful hack in the PCI driver to program the lookup table >> suitably) :D >> >> Robin. >> >> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 >> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 >> >>> >>> Thanks, >>> Yang >>> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-15 12:30 ` Robin Murphy @ 2016-04-15 17:19 ` Shi, Yang 2016-04-15 17:44 ` Robin Murphy 0 siblings, 1 reply; 7+ messages in thread From: Shi, Yang @ 2016-04-15 17:19 UTC (permalink / raw) To: linux-arm-kernel On 4/15/2016 5:30 AM, Robin Murphy wrote: > On 15/04/16 00:07, Shi, Yang wrote: >> Hi Robin, >> >> On 4/14/2016 5:04 AM, Robin Murphy wrote: >>> Hi Yang, >>> >>> On 13/04/16 20:31, Shi, Yang wrote: >>>> Hi Will & Robin, >>>> >>>> I just ran some quick test on my LS2085A board, which has 8 Cortex A57 >>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >>>> >>>> SMMU driver reports: >>>> >>>> arm_smmu_global_fault: 297974 callbacks suppressed >>>> arm_smmu_global_fault: 298561 callbacks suppressed >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>> >>> That's a stream match conflict fault, so you've somehow got two devices >>> using the same stream ID attached to different domains, and at least one >>> of them is trying to do DMA. >>> >>>> But, it is good with 4.5 kernel. I found the below commit causes it: >>>> >>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >>>> Author: Robin Murphy <robin.murphy@arm.com> >>>> Date: Tue Jan 26 18:06:36 2016 +0000 >>>> >>>> iommu/arm-smmu: Support DMA-API domains >>>> >>>> With DMA mapping ops provided by the iommu-dma code, only a >>>> minimal >>>> contribution from the IOMMU driver is needed to create a suitable >>>> DMA-API domain for them to use. Implement this for the ARM SMMUs. >>>> >>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >>>> Signed-off-by: Will Deacon <will.deacon@arm.com> >>>> >>>> Any idea? >>> >>> My first guess would be the same thing as [1] - does that patch help? >> >> No, it can't cease the fault. >> >>> >>> Beyond that, what does your DT look like? The one in mainline has one >>> token mmu-masters property which isn't even valid, so nothing ever gets >> >> Mine has mmu-masters property too, but removing it doesn't solve the >> problem. > > OK, now things really stop making sense. Without the mmu-masters > property the SMMU driver will do nothing but probe the SMMU device > itself. Therefore I can only assume the bootloader magic for the > Freescale vendor kernel must be rewriting your DT (our board always just > says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline > DT containing the SMMU, so I'm not sure exactly what it's looking for). > Can you see what it's done via /sys/fimware/fdt (or > /sys/firmware/devicetree/base/ if you can face hunting down phandles > manually)? As a further sanity check, what do you see in > /sys/kernel/iommu_groups/*/devices/ and do they differ between the two > kernels? With the mmu-masters property, the fsl-mc will be added into group 2, please see the below dmesg log: iommu: Adding device 3600000.pcie to group 0 iommu: Adding device 3700000.pcie to group 1 iommu: Adding device 80c000000.fsl-mc to group 2 fsl-mc won't be there if mmu-masters property is removed. But, it looks there are multiple devices in iommu_groups 3: root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/ It is group 2 if mmu-masters property is removed. I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and 0000:01:00.1 is for the NIC. Is this behavior expected? > > Secondly, the stream match conflict can only occur if the SMRs are > actually programmed. Since with Will's fix for the conflicts Eric saw we > should attach to the default domain without touching the initial bypass > entries in the SMRs, I'm at a loss to see how you could still get into > this state with that patch applied. I mixed up the patch, with Will's fix applied, the issue is gone away. Thanks, Yang > > As the transactions provoking the fault are apparently instruction > fetches on a 00xx stream ID, which I've not seen before, my first guess > would be it's something to do with the management complex (which I can't > get to work with the staging driver due to firmware incompatibility), > but then looking at the general lack of connection to the DMA API within > that driver, maybe not? > > Robin. > >> >> Thanks, >> Yang >> >> >>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A >>> earlier this week - so there's clearly something going on there. >>> >>> More generally, I'd note that the mmu-masters binding will never fully >>> work on this board - you can get the platform devices to cooperate by >>> programming the assorted ICID registers to ensure they present unique >>> stream IDs, but PCI devices cannot work at all because there's no way to >>> make the stream IDs coming out of the root complex be equal to the PCI >>> RID in the way it relies on. In that sense, any regression here is quite >>> likely just a shift from "subtly not working" to "loudly and obnoxiously >>> not working". Conversely, those reasons have also proved it a really >>> useful platform for implementing and testing the iommu-map binding[2] >>> (with an awful hack in the PCI driver to program the lookup table >>> suitably) :D >>> >>> Robin. >>> >>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 >>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 >>> >>>> >>>> Thanks, >>>> Yang >>>> >>> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-15 17:19 ` Shi, Yang @ 2016-04-15 17:44 ` Robin Murphy 2016-04-15 17:55 ` Shi, Yang 0 siblings, 1 reply; 7+ messages in thread From: Robin Murphy @ 2016-04-15 17:44 UTC (permalink / raw) To: linux-arm-kernel On 15/04/16 18:19, Shi, Yang wrote: > On 4/15/2016 5:30 AM, Robin Murphy wrote: >> On 15/04/16 00:07, Shi, Yang wrote: >>> Hi Robin, >>> >>> On 4/14/2016 5:04 AM, Robin Murphy wrote: >>>> Hi Yang, >>>> >>>> On 13/04/16 20:31, Shi, Yang wrote: >>>>> Hi Will & Robin, >>>>> >>>>> I just ran some quick test on my LS2085A board, which has 8 Cortex A57 >>>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >>>>> >>>>> SMMU driver reports: >>>>> >>>>> arm_smmu_global_fault: 297974 callbacks suppressed >>>>> arm_smmu_global_fault: 298561 callbacks suppressed >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious >>>> >>>> That's a stream match conflict fault, so you've somehow got two devices >>>> using the same stream ID attached to different domains, and at least >>>> one >>>> of them is trying to do DMA. >>>> >>>>> But, it is good with 4.5 kernel. I found the below commit causes it: >>>>> >>>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >>>>> Author: Robin Murphy <robin.murphy@arm.com> >>>>> Date: Tue Jan 26 18:06:36 2016 +0000 >>>>> >>>>> iommu/arm-smmu: Support DMA-API domains >>>>> >>>>> With DMA mapping ops provided by the iommu-dma code, only a >>>>> minimal >>>>> contribution from the IOMMU driver is needed to create a suitable >>>>> DMA-API domain for them to use. Implement this for the ARM SMMUs. >>>>> >>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >>>>> Signed-off-by: Will Deacon <will.deacon@arm.com> >>>>> >>>>> Any idea? >>>> >>>> My first guess would be the same thing as [1] - does that patch help? >>> >>> No, it can't cease the fault. >>> >>>> >>>> Beyond that, what does your DT look like? The one in mainline has one >>>> token mmu-masters property which isn't even valid, so nothing ever gets >>> >>> Mine has mmu-masters property too, but removing it doesn't solve the >>> problem. >> >> OK, now things really stop making sense. Without the mmu-masters >> property the SMMU driver will do nothing but probe the SMMU device >> itself. Therefore I can only assume the bootloader magic for the >> Freescale vendor kernel must be rewriting your DT (our board always just >> says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline >> DT containing the SMMU, so I'm not sure exactly what it's looking for). >> Can you see what it's done via /sys/fimware/fdt (or >> /sys/firmware/devicetree/base/ if you can face hunting down phandles >> manually)? As a further sanity check, what do you see in >> /sys/kernel/iommu_groups/*/devices/ and do they differ between the two >> kernels? > > With the mmu-masters property, the fsl-mc will be added into group 2, > please see the below dmesg log: > > iommu: Adding device 3600000.pcie to group 0 > iommu: Adding device 3700000.pcie to group 1 > iommu: Adding device 80c000000.fsl-mc to group 2 > > fsl-mc won't be there if mmu-masters property is removed. > > But, it looks there are multiple devices in iommu_groups 3: > > root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0 > 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/ > > It is group 2 if mmu-masters property is removed. > > I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and > 0000:01:00.1 is for the NIC. > > Is this behavior expected? Yup - since the root complex doesn't support ACS, the IOMMU API puts all the devices behind it (in this case the NIC and the bridge itself) in the same group, because otherwise it might be possible for two devices assigned to different guests to DMA directly to each other without going through the IOMMU. >> Secondly, the stream match conflict can only occur if the SMRs are >> actually programmed. Since with Will's fix for the conflicts Eric saw we >> should attach to the default domain without touching the initial bypass >> entries in the SMRs, I'm at a loss to see how you could still get into >> this state with that patch applied. > > I mixed up the patch, with Will's fix applied, the issue is gone away. Phew, that's a relief! Would you be happy to give a Tested-by on that patch? Thanks, Robin. > > Thanks, > Yang > >> >> As the transactions provoking the fault are apparently instruction >> fetches on a 00xx stream ID, which I've not seen before, my first guess >> would be it's something to do with the management complex (which I can't >> get to work with the staging driver due to firmware incompatibility), >> but then looking at the general lack of connection to the DMA API within >> that driver, maybe not? >> >> Robin. >> >>> >>> Thanks, >>> Yang >>> >>> >>>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A >>>> earlier this week - so there's clearly something going on there. >>>> >>>> More generally, I'd note that the mmu-masters binding will never fully >>>> work on this board - you can get the platform devices to cooperate by >>>> programming the assorted ICID registers to ensure they present unique >>>> stream IDs, but PCI devices cannot work at all because there's no >>>> way to >>>> make the stream IDs coming out of the root complex be equal to the PCI >>>> RID in the way it relies on. In that sense, any regression here is >>>> quite >>>> likely just a shift from "subtly not working" to "loudly and >>>> obnoxiously >>>> not working". Conversely, those reasons have also proved it a really >>>> useful platform for implementing and testing the iommu-map binding[2] >>>> (with an awful hack in the PCI driver to program the lookup table >>>> suitably) :D >>>> >>>> Robin. >>>> >>>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 >>>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 >>>> >>>>> >>>>> Thanks, >>>>> Yang >>>>> >>>> >>> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* SMMU problem found on LS2085A with 4.6-rc3 2016-04-15 17:44 ` Robin Murphy @ 2016-04-15 17:55 ` Shi, Yang 0 siblings, 0 replies; 7+ messages in thread From: Shi, Yang @ 2016-04-15 17:55 UTC (permalink / raw) To: linux-arm-kernel On 4/15/2016 10:44 AM, Robin Murphy wrote: > On 15/04/16 18:19, Shi, Yang wrote: >> On 4/15/2016 5:30 AM, Robin Murphy wrote: >>> On 15/04/16 00:07, Shi, Yang wrote: >>>> Hi Robin, >>>> >>>> On 4/14/2016 5:04 AM, Robin Murphy wrote: >>>>> Hi Yang, >>>>> >>>>> On 13/04/16 20:31, Shi, Yang wrote: >>>>>> Hi Will & Robin, >>>>>> >>>>>> I just ran some quick test on my LS2085A board, which has 8 Cortex >>>>>> A57 >>>>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU. >>>>>> >>>>>> SMMU driver reports: >>>>>> >>>>>> arm_smmu_global_fault: 297974 callbacks suppressed >>>>>> arm_smmu_global_fault: 298561 callbacks suppressed >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: GFSR 0x80000004, GFSYNR0 0x00000008, >>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000 >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be >>>>>> serious >>>>> >>>>> That's a stream match conflict fault, so you've somehow got two >>>>> devices >>>>> using the same stream ID attached to different domains, and at least >>>>> one >>>>> of them is trying to do DMA. >>>>> >>>>>> But, it is good with 4.5 kernel. I found the below commit causes it: >>>>>> >>>>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4 >>>>>> Author: Robin Murphy <robin.murphy@arm.com> >>>>>> Date: Tue Jan 26 18:06:36 2016 +0000 >>>>>> >>>>>> iommu/arm-smmu: Support DMA-API domains >>>>>> >>>>>> With DMA mapping ops provided by the iommu-dma code, only a >>>>>> minimal >>>>>> contribution from the IOMMU driver is needed to create a >>>>>> suitable >>>>>> DMA-API domain for them to use. Implement this for the ARM >>>>>> SMMUs. >>>>>> >>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com> >>>>>> Signed-off-by: Will Deacon <will.deacon@arm.com> >>>>>> >>>>>> Any idea? >>>>> >>>>> My first guess would be the same thing as [1] - does that patch help? >>>> >>>> No, it can't cease the fault. >>>> >>>>> >>>>> Beyond that, what does your DT look like? The one in mainline has one >>>>> token mmu-masters property which isn't even valid, so nothing ever >>>>> gets >>>> >>>> Mine has mmu-masters property too, but removing it doesn't solve the >>>> problem. >>> >>> OK, now things really stop making sense. Without the mmu-masters >>> property the SMMU driver will do nothing but probe the SMMU device >>> itself. Therefore I can only assume the bootloader magic for the >>> Freescale vendor kernel must be rewriting your DT (our board always just >>> says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline >>> DT containing the SMMU, so I'm not sure exactly what it's looking for). >>> Can you see what it's done via /sys/fimware/fdt (or >>> /sys/firmware/devicetree/base/ if you can face hunting down phandles >>> manually)? As a further sanity check, what do you see in >>> /sys/kernel/iommu_groups/*/devices/ and do they differ between the two >>> kernels? >> >> With the mmu-masters property, the fsl-mc will be added into group 2, >> please see the below dmesg log: >> >> iommu: Adding device 3600000.pcie to group 0 >> iommu: Adding device 3700000.pcie to group 1 >> iommu: Adding device 80c000000.fsl-mc to group 2 >> >> fsl-mc won't be there if mmu-masters property is removed. >> >> But, it looks there are multiple devices in iommu_groups 3: >> >> root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0 >> 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/ >> >> It is group 2 if mmu-masters property is removed. >> >> I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and >> 0000:01:00.1 is for the NIC. >> >> Is this behavior expected? > > Yup - since the root complex doesn't support ACS, the IOMMU API puts all > the devices behind it (in this case the NIC and the bridge itself) in > the same group, because otherwise it might be possible for two devices > assigned to different guests to DMA directly to each other without going > through the IOMMU. > >>> Secondly, the stream match conflict can only occur if the SMRs are >>> actually programmed. Since with Will's fix for the conflicts Eric saw we >>> should attach to the default domain without touching the initial bypass >>> entries in the SMRs, I'm at a loss to see how you could still get into >>> this state with that patch applied. >> >> I mixed up the patch, with Will's fix applied, the issue is gone away. > > Phew, that's a relief! Would you be happy to give a Tested-by on that > patch? Sure, just added by Tested-by to that patch. Thanks for your help. Regards, Yang > > Thanks, > Robin. > >> >> Thanks, >> Yang >> >>> >>> As the transactions provoking the fault are apparently instruction >>> fetches on a 00xx stream ID, which I've not seen before, my first guess >>> would be it's something to do with the management complex (which I can't >>> get to work with the staging driver due to firmware incompatibility), >>> but then looking at the general lack of connection to the DMA API within >>> that driver, maybe not? >>> >>> Robin. >>> >>>> >>>> Thanks, >>>> Yang >>>> >>>> >>>>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A >>>>> earlier this week - so there's clearly something going on there. >>>>> >>>>> More generally, I'd note that the mmu-masters binding will never fully >>>>> work on this board - you can get the platform devices to cooperate by >>>>> programming the assorted ICID registers to ensure they present unique >>>>> stream IDs, but PCI devices cannot work at all because there's no >>>>> way to >>>>> make the stream IDs coming out of the root complex be equal to the PCI >>>>> RID in the way it relies on. In that sense, any regression here is >>>>> quite >>>>> likely just a shift from "subtly not working" to "loudly and >>>>> obnoxiously >>>>> not working". Conversely, those reasons have also proved it a really >>>>> useful platform for implementing and testing the iommu-map binding[2] >>>>> (with an awful hack in the PCI driver to program the lookup table >>>>> suitably) :D >>>>> >>>>> Robin. >>>>> >>>>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810 >>>>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454 >>>>> >>>>>> >>>>>> Thanks, >>>>>> Yang >>>>>> >>>>> >>>> >>> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-04-15 17:55 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-13 19:31 SMMU problem found on LS2085A with 4.6-rc3 Shi, Yang 2016-04-14 12:04 ` Robin Murphy 2016-04-14 23:07 ` Shi, Yang 2016-04-15 12:30 ` Robin Murphy 2016-04-15 17:19 ` Shi, Yang 2016-04-15 17:44 ` Robin Murphy 2016-04-15 17:55 ` Shi, Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).