From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Subject: Re: framebuffer corruption due to overlapping stp instructions on arm64 To: Mikulas Patocka , Ard Biesheuvel References: <20180803094129.GB17798@arm.com> From: Robin Murphy Message-ID: <99fff4fe-afa9-f12f-a518-472a9dd1c530@arm.com> Date: Mon, 6 Aug 2018 13:42:20 +0100 MIME-Version: 1.0 In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Thomas Petazzoni , Joao Pinto , Catalin Marinas , linux-pci , Will Deacon , Russell King , Linux Kernel Mailing List , Matt Sealey , Jingoo Han , linux-arm-kernel Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+bjorn=helgaas.com@lists.infradead.org List-ID: On 06/08/18 11:25, Mikulas Patocka wrote: [...] >> None of this explains why some transactions fail to make it across >> entirely. The overlapping writes in question write the same data to >> the memory locations that are covered by both, and so the ordering in >> which the transactions are received should not affect the outcome. > > You're right that the corruption couldn't be explained just by reordering > writes. My hypothesis is that the PCIe controller tries to disambiguate > the overlapping writes, but the disambiguation logic was not tested and it > is buggy. If there's a barrier between the overlapping writes, the PCIe > controller won't see any overlapping writes, so it won't trigger the > faulty disambiguation logic and it works. > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. > I observe these kinds of corruptions: > - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel