From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marek Vasut Date: Tue, 10 Jul 2018 18:11:59 +0200 Subject: [U-Boot] SoCFPGA PL330 DMA driver and ECC scrubbing In-Reply-To: <5a6e8f4a-01b7-634f-2cca-79a4f11f91ff@gmail.com> References: <73d4ce24-2b78-d27c-1dbf-ebef00e9689b@gmail.com> <93c4b5bd-ea97-f97b-e044-6a40b5099cdb@denx.de> <4c001307-e555-b085-9982-4a6e0f8fb669@gmail.com> <57651474-ac6c-b5d2-e675-1d01de30dbca@denx.de> <277351bc-f902-826b-f9f3-f00f380b1e81@denx.de> <1aecf0e6-56f1-9254-25d8-5d45c5c9be59@gmail.com> <5dceaad3-353e-b015-0871-93986f3ebc5d@denx.de> <9db37c5e-abf9-cbbf-0e6b-b269dfcd6528@gmail.com> <04312468-3dcc-3fa4-3e70-2fcc07e1051e@denx.de> <8523e998-9ae2-e713-60ec-6d740ab0d3c4@gmail.com> <7d7d8558-98f2-c0df-02ad-a38727e61d71@gmail.com> <5a6e8f4a-01b7-634f-2cca-79a4f11f91ff@gmail.com> Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: u-boot@lists.denx.de On 07/10/2018 02:10 PM, Jason Rush wrote: > On 7/9/2018 3:08 AM, Marek Vasut wrote: >> On 07/07/2018 12:56 AM, Jason Rush wrote: >>> On 7/5/2018 6:10 PM, Marek Vasut wrote: >>>> On 07/06/2018 01:11 AM, Jason Rush wrote: >>>>> On 7/4/2018 2:23 AM, Marek Vasut wrote: >>>>>> On 07/04/2018 01:45 AM, Jason Rush wrote: >>>>>>> On 7/3/2018 9:08 AM, Marek Vasut wrote: >>>>>>>> On 07/03/2018 03:58 PM, Jason Rush wrote: >>>>>>>>> On 6/29/2018 10:17 AM, Marek Vasut wrote: >>>>>>>>>> On 06/29/2018 05:06 PM, Jason Rush wrote: >>>>>>>>>>> On 6/29/2018 9:52 AM, Marek Vasut wrote: >>>>>>>>>>>> On 06/29/2018 04:44 PM, Jason Rush wrote: >>>>>>>>>>>>> On 6/29/2018 9:34 AM, Marek Vasut wrote: >>>>>>>>>>>>>> On 06/29/2018 04:31 PM, Jason Rush wrote: >>>>>>>>>>>>>>> Dinh, >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> A while ago, you posted the following patchset for SoCFPGA to add the PL330 >>>>>>>>>>>>>>> DMA driver, and updated the SoCFPGA SDRAM init to write zeros to SDRAM to >>>>>>>>>>>>>>> initialize the ECC bits if ECC was enabled: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-October/269643.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I know it's been a long time, so I'll summarize some of the conversation... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> At the time, you had a problem with the patchset causing the SPL to fail to >>>>>>>>>>>>>>> find the MMC. You had tracked it down to an issue with the following commit >>>>>>>>>>>>>>> "a78cd8613204 ARM: Rework and correct barrier definitions". You and Marek >>>>>>>>>>>>>>> discussed it a bit, but I don't think there was a real conclusion. You >>>>>>>>>>>>>>> submitted a second version of the patchset asking for advice on debugging >>>>>>>>>>>>>>> the issue: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://lists.denx.de/pipermail/u-boot/2016-December/275822.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> No real conversation came from the second patchset, and that was the end of >>>>>>>>>>>>>>> the patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was hoping we could revisit adding your patchset again. I am working on a >>>>>>>>>>>>>>> custom SoCFPGA board with a Cyclone V and ECC SDRAM. I rebased your patchset >>>>>>>>>>>>>>> against v2018.05 and it is working on my custom board (although I don't have >>>>>>>>>>>>>>> an MMC). I also tested it on a SoCKit booting from an MMC (I forced it to >>>>>>>>>>>>>>> scrub the SDRAM on the SoCKit, because it doesn't have ECC RAM), and the >>>>>>>>>>>>>>> SoCKit finds the MMC and boots. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't have any suggestions on why it is working now on my board and not >>>>>>>>>>>>>>> back when you first submitted the patchset. Maybe something else was fixed >>>>>>>>>>>>>>> in the MMC? I was hoping you and Marek could test this patch again on some >>>>>>>>>>>>>>> different SoCFPGA boards to see if you get the same results. >>>>>>>>>>>>>> Look at this patch >>>>>>>>>>>>>> http://git.denx.de/?p=u-boot/u-boot-socfpga.git;a=commit;h=9bb8a249b292d26f152c20e3641600b3d7b3924b >>>>>>>>>>>>>> >>>>>>>>>>>>>> You likely want similar approach, it's faster then the DMA and much simpler. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks Marek.  I'll give it a try.  Would you be interested in a similar patch for the Gen 5? >>>>>>>>>>>> I don't have any Gen5 board which uses ECC, do you ? >>>>>>>>>>>> If so, yes, prepare a patch, it should be very similar. >>>>>>>>>>>> >>>>>>>>>>>> Make sure to measure how long it takes to scrub the memory and how much >>>>>>>>>>>> memory you have, I'd be interested in the numbers. >>>>>>>>>>>> >>>>>>>>>>> Looking at the master branch, it doesn't look like that code is ever being called? >>>>>>>>>>> The sdram_init_ecc_bits() function is called from the ddr_calibration_sequence function(), >>>>>>>>>>> but I can't find where ddr_calibration_sequence is called(). >>>>>>>>>> git grep for it, it's called from somewhere in the arch/arm/mach-socfpga/ >>>>>>>>>> >>>>>>>>>>> Either way, I can test it. I have a custom Cyclone V board with ECC, and the Intel Arria V SoC >>>>>>>>>>> Dev Kit I can test it on too which I think has ECC. >>>>>>>>>> Please do. >>>>>>>>>> >>>>>>>>> I implemented a similar memset approach for the gen 5 socfpga.  It's basically the same >>>>>>>>> code as in that patch; however, when I performed a single memset the processor would >>>>>>>>> reset for some reason.  I changed it to loop over calling memset with a size of 32MB over >>>>>>>>> the entire address the address, and that worked as opposed to doing a single memset on >>>>>>>>> the RAM. >>>>>>>> Can you do grep MEMSET .config in your U-Boot build dir ? The arch >>>>>>>> memset is implemented in assembler and doesn't trigger WDT , so if it >>>>>>>> takes too long, it could be that the WDT resets the platform. >>>>>>> Both CONFIG_USE_ARCH_MEMSET and CONFIG_SPL_USE_ARCH_MEMSET >>>>>>> are set in my .config, so it must be the WDT triggering as you suspect. >>>>>>> >>>>>>>>> I started on a SoCKit because it was handy, I know it doesn't have ECC >>>>>>>> It doesn't by default. >>>>>>>> >>>>>>>>> , but I forced it to >>>>>>>>> initialize the RAM as a quick test.  It seems much slower than the DMA approach.  It >>>>>>>>> should be noted, I didn't implement any code to time the scrubbing, but rather just >>>>>>>>> roughly monitored the time to get a rough idea of how long it took. >>>>>>>>> >>>>>>>>> On the SoCKit, which has 1GB of RAM, the memset takes around 8 seconds to complete, >>>>>>>>> and the DMA takes under 2 seconds. >>>>>>>> Did you enable i/d cache in the SPL ? It's mandatory, otherwise it's >>>>>>>> slow. >>>>>>> I have calls to icache_enable() and dcache_enable() just as you do in >>>>>>> the Arria 10 sdram_init_ecc_bits() function. >>>>>>> >>>>>>> I did double check that both these enable functions call the versions >>>>>>> of the functions in the ./arch/arm/lib/cache-cp15.c file that are >>>>>>> implemented in the SPL.  So I believe that both icache and dcache is >>>>>>> enabled. >>>>>> Are you sure it's not just the stubs that are called ? Or that the code >>>>>> doesn't skip the dcache enabling due to some funny stuff, like MMU being >>>>>> already enabled ? >>>>> I added prints to ensure it is calling the real icache_enable()/dcache_enable() >>>>> functions, and not the stubs. >>>>> >>>>>>> I probably should have added a print of icache_status() and >>>>>>> dcache_status() to verify the caches are enabled.  I'll add that >>>>>>> tomorrow. >>>>>> Yes, you really should verify that the dcache was enabled. >>>>>> >>>>>>>> Just be careful about the MMU tables placement, they are big and >>>>>>>> if you place them in RAM, make sure you don't overwrite them with the >>>>>>>> memset. The trick might be to memset the first 1 MiB of RAM, then put >>>>>>>> MMU tables at some offset therein (since 0x0 can be used for ARM >>>>>>>> vectors) and then turn on i/d cache and memset the rest. >>>>>>> That is essentially what I am doing I believe, with the exception that I >>>>>>> am only clearing the first 32KiB before initializing the MMU table (which >>>>>>> is what you did in the Arria 10 version). >>>>>>> >>>>>>> I modeled my code almost identically to yours with the exception that >>>>>>> I loop over the memset calls 32MiB at a time. Here's the order of >>>>>>> operations I perform: >>>>>>> >>>>>>> 1. icache_enable() >>>>>>> 2. memset the first 0x8000 bytes to zero >>>>>>> 3. setup gd->arch.tlb_arch and gd->arch.tlb_size >>>>>>> 4. dcache_enable() >>>>>>> 5. loop over remaining memory, memsetting 32MiB at a time to zero >>>>>>> 6. flush_dcache_all() >>>>>>> 7. dcache_disable() >>>>>>> >>>>>>> It looks like the call to dcache_enable is what sets up the MMU tables. >>>>>>> I suspect that's why you did a memset of the first 32KiB before enabling >>>>>>> the dcache on the Arria 10.  I think the MMU is initialized okay since the >>>>>>> SPL keeps executing, u-boot loads, and Linux boots after running the >>>>>>> above (maybe that's not a fair assumption). >>>>>> I had to write zeroes to the first 32kiB to init the ECC counters before >>>>>> putting MMU tables there. >>>>>> >>>>>> You really should double check if the MMU and dcache are enabled, 8 >>>>>> seconds to scrub the memory is too long I think. >>>>> I added checks to verify that the MMU, icache, and dcache are all setup and >>>>> enabled. >>>>> >>>>> Calling icache_enable() set the CR_I bit (Icache enable) in the CR (control >>>>> register).  Then calling dcache_enable() called the mmu_setup() function, >>>>> which setup the MMU and set the CR_M bit (MMU enable) in the CR, and >>>>> finally dcache_enable() set the CR_C bit (Dcache enable) bit in the CR. >>>>> >>>>> I also printed out the control register before the memset calls, and it >>>>> indicated that the mmu, icache, and dcache were enabled. >>>> Is the DRAM area set as cacheable in the MMU tables ? >>>> >>> Good news bad news...  The MMU tables weren't being set up because the >>> bd->bi_dram[bank].start and bd->bi_dram[bank].size weren't set up.  As a quick >>> test, I hardcoded start to 0 and size to 1GiB.  After that, the memset was >>> really quick, U-Boot loads, Linux loads, and everything seems to work great. >> Good. >> >>> However, if I press the HPS_RST push button on the SoCKit (which is connected >>> to power on reset), occasionally U-Boot will lock up while booting.  It always >>> boots and operates correctly from the initial power on, but it almost always >>> fails to boot after pressing the HPS_RST button. >>> >>> Usually after pressing the HPS_RST button, U-Boot makes it past the SPL, and >>> hangs somewhere after the call to setup_reloc() in ./common/board_f.c.  Once >>> it hangs there, pressing the HPS_RST button again usually causes the SPL to >>> hang while setting up the MMU (before my call to memset).  Eventually the >>> WDT kicks in, and it just keeps hanging up in the same place.  Once it gets in >>> this mode, the only way to recover it is by toggling power on the board. >>> >>> I spent a bunch of time today trying to track down where it was hanging, but >>> I couldn't pin point anything.  The MMU tables looked correct.  The MMU >>> registers looked good.  I'm not sure the best way to debug what's going on. >> Try triggering warm reset and cold reset via the reset register: >> >> mw 0xffd05004 1 >> mw 0xffd05004 2 >> >> Does it hang in one case and not in the other ? >> > It hangs in both cases. > > I did find that if I do not metset the last 1MiB of DRAM with the cache on, > both warm and cold resets work. > > I changed the ecc scrubbing to zero out the first 0x8000 bytes and the last > 0x10000 bytes before the MMU is setup and I enable dcache.  Then with > the dcache enabled, I zero out the rest of memory.  The resets work in this > case as well.  So there seems to be some side effect of clearing out the > relocate address space with the cache on. Can you investigate ? -- Best regards, Marek Vasut