From mboxrd@z Thu Jan 1 00:00:00 1970 From: jgunthorpe@obsidianresearch.com (Jason Gunthorpe) Date: Fri, 10 Jan 2014 17:19:32 -0700 Subject: [PATCH] ARM: Kirkwood: Add support for Excito Bubba B3 In-Reply-To: <20140110230248.GO9681@lunn.ch> References: <1388247131-19301-1-git-send-email-andrew@lunn.ch> <20131228170114.GH19878@titan.lakedaemon.net> <20140102194924.GA3321@obsidianresearch.com> <1388702192.16958.54.camel@hastur.hellion.org.uk> <20140102230832.GB9339@obsidianresearch.com> <20140110192032.GQ19878@titan.lakedaemon.net> <20140110194437.GJ18269@obsidianresearch.com> <20140110230248.GO9681@lunn.ch> Message-ID: <20140111001931.GM18269@obsidianresearch.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Sat, Jan 11, 2014 at 12:02:48AM +0100, Andrew Lunn wrote: > > > So, I'd prefer to handle this more gracefully. I don't have much > > > experience at the low-level init of the caches, couldn't we enable and > > > flush rather than throwing the error? > > > > It cannot be solved in cache-feroceon-l2.c. > > > > In many cases you won't even get that far: > > - It will blow up in head.S when the cache is off: decompressor wrote > > writeback data to into the L2 and uncached fetches see memory > > content prior to decompression > > - It will blow up after head.S enables the L1: decompressor wrote > > data into the L2 but the relocation writes done with the L1 off > > made changes to memory that were not captured in the L2. > > My impression with booting a lot of times while doing i bisect, was > that it got as far as: > > Uncompressing Linux... done, booting the kernel. > Booting Linux on physical CPU 0x0 > Linux version 3.13.0-rc1-dirty (lunn at laptop) (gcc version 4.4.5 (Debian 4.4.5-8) ) #7 PREEMPT Fri Jan 3 09:25:07 CST 2014 > CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977 > CPU: VIVT data cache, VIVT instruction cache > Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family > Memory policy: ECC disabled, Data cache writeback > > every time. It should be deterministic, but 'random' in the sense it is sensitive to exactly the data pattern generated by the decompressor and exactly the data pattern generated by the relocation writes - so the corruption will vary depending on the kernel image content (position of the relocations) and also the size of the L2. You are likely seeing the 2nd failure I predicted, which is a failure when a cache line containing a relocation is used, and happened to be in the L2. The first failure would require the kernel image to be smaller than the cache or the cache eviction policy to be such that the initial part of the decompressed image remains dirty and unevicted in the L2. > It does however seem possible to insert platform specific code soon > after it prints: > > Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family So if you attempt to do the flush/disable at this point you are relying on the L2 containing no dirty lines that overlap with lines that were changed by the relocator while the cache was off. This is *probably* true in many cases, but it is sktechy, and creates a hidden hard to find pitfall. This is basically the same 'probably' that enables non-relocated kernels to boot. Basically, you might get a bootable system, but it is technically wrong and has a theoretical hole. Jason