From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Warren Date: Tue, 17 Mar 2015 11:29:16 -0600 Subject: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations In-Reply-To: <550840D1.9060608@gmail.com> References: <1426227189-30488-1-git-send-email-swarren@wwwdotorg.org> <5505AD75.2030607@wwwdotorg.org> <201503151920.56824.marex@denx.de> <550799BA.5000409@wwwdotorg.org> <550840D1.9060608@gmail.com> Message-ID: <5508646C.5040802@wwwdotorg.org> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 03/17/2015 08:57 AM, popcorn mix wrote: > On 17/03/15 03:04, Stephen Warren wrote: >> It would be nice though if someone from the RPi Foundation could comment >> on the exact effect of the upper bus address bits, and why 0xc would >> work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status >> (enabled, disabled) interacts with the GPU cache enable in any way, e.g. >> burst vs. non-burst transactions on the bus or something? That's about >> the only reason I can see for the RPi Foundation kernel working with 0x4 >> bus addresses on both chips, but U-Boot needing something different on >> RPi2... >> >> Dom, for reference, see: >> http://lists.denx.de/pipermail/u-boot/2015-March/207947.html >> http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947 Thanks for the great explanation. I'll have to bookmark/archive it:-) > First, remember that 2835 is a large GPU with a small ARM attached. On > some platforms the ARM is not even used. > The GPU boots first and may wake the arm. The GPU is the centre of the > universe, and the ARM has to fit in. > > Okay, I'll try to explain what goes on. Here are my definitions of some > terms: > > bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of > addressable memory. The top two bits define the caching alias. > physical address: An ARM side address given to the VC MMU. This is a 30 > bit address space. > > The GPU always uses bus addresses. GPU bus mastering peripherals (like > DMA) use bus addresses. The ARM uses physical addresses. > > VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page > is 16M and there are 64 pages. This maps 30-bits of physical address to > 32-bits of bus address. > > The setup of VC MMU is handled by the GPU and by default the mapping is: > 2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus > addresses 0x40000000-0x5ffffffff. The next page maps physical adddress > 0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff > > 2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus > addresses 0xc0000000-0xfefffffff. The next page maps physical adddress > 0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff OK, this explains why in U-Boot, we need to OR in 0x40000000 on bcm2835 and 0xc0000000 on bcm2836; that matches the VC MMU setup. I guess we need to fix the U-Boot mailbox driver too, and many things in the upstream RPi kernel. I have two more questions: 1) Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see: > https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/include/mach/memory.h#L38 > #ifdef CONFIG_BCM2708_NOL2CACHE > #define _REAL_BUS_OFFSET UL(0xC0000000) /* don't use L1 or L2 caches */ > #else > #define _REAL_BUS_OFFSET UL(0x40000000) /* use L2 cache */ > #endif That's identical in the mach-bcm2709 version too. However, arch/arm/mach-bcm270[89]/Kconfig's entry for that config option: > config BCM2708_NOL2CACHE > bool "Videocore L2 cache disable" > depends on MACH_BCM2709 > default y > help > Do not allow ARM to use GPU's L2 cache. Requires disable_l2cache in config.txt. Has "default n" for the bcm2708 version and "default y" for the bcm2709 version. If I'd noticed that difference in default value, it would have been a big clue that what I proposed in the U-Boot patch was correct! Anyway, this implies that there are separate kernel binaries for the RPi 1 and RPi 2, since otherwise those default values wouldn't work. 2) I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there? Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check) ... > So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a > 128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's > not particularly close (i.e. not very fast). > However mapping through the L2 allocating alias (0x4) was shown to be > beneficial on 2835, so that is the alias we use. > > The situation is different on 2836. The ARM has a 32K L1 cache and a > 512M integrated/fast L2 cache. Additionally going through the > smaller/slower GPU L2 is bad for performance. > So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache. I assume 128M and 512M there should be 128K and 512K?