linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
@ 2015-05-04 19:33 Eric Anholt
  2015-05-04 20:25 ` Noralf Trønnes
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Eric Anholt @ 2015-05-04 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

There exists a tiny MMU, configurable only by the VC (running the
closed firmware), which maps from the ARM's physical addresses to bus
addresses.  These bus addresses determine the caching behavior in the
VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
2 bits.  The bits in the bus address mean:

>From the VideoCore processor:
0x0... L1 and L2 cache allocating and coherent
0x4... L1 non-allocating, but coherent. L2 allocating and coherent
0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

>From the GPU peripherals (note: all peripherals bypass the L1
cache. The ARM will see this view once through the VC MMU):
0x0... Do not use
0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

The 2835 firmware always configures the MMU to turn ARM physical
addresses with 0x0 top bits to 0x4, meaning present in L2 but
incoherent with L1.  However, any bus addresses we were generating in
the kernel to be passed to a device had 0x0 bits.  That would be a
reserved (possibly totally incoherent) value if sent to a GPU
peripheral like USB, or L1 allocating if sent to the VC (like a
firmware property request).  By setting dma-ranges, all of the devices
below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
friends return addresses with 0x4 bits and avoid cache incoherency.

This matches the behavior in the downstream 2708 kernel (see
BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: popcornmix at gmail.com
---
 arch/arm/boot/dts/bcm2835.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
index 5734650..2df1b5c 100644
--- a/arch/arm/boot/dts/bcm2835.dtsi
+++ b/arch/arm/boot/dts/bcm2835.dtsi
@@ -15,6 +15,7 @@
 		#address-cells = <1>;
 		#size-cells = <1>;
 		ranges = <0x7e000000 0x20000000 0x02000000>;
+		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
 
 		timer at 7e003000 {
 			compatible = "brcm,bcm2835-system-timer";
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-04 19:33 [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM Eric Anholt
@ 2015-05-04 20:25 ` Noralf Trønnes
  2015-05-05  0:07   ` Eric Anholt
  2015-05-05 19:31   ` Stephen Warren
  2015-05-05 19:29 ` Stephen Warren
  2015-05-05 20:10 ` [PATCH v2] " Eric Anholt
  2 siblings, 2 replies; 11+ messages in thread
From: Noralf Trønnes @ 2015-05-04 20:25 UTC (permalink / raw)
  To: linux-arm-kernel


Den 04.05.2015 21:33, skrev Eric Anholt:
> There exists a tiny MMU, configurable only by the VC (running the
> closed firmware), which maps from the ARM's physical addresses to bus
> addresses.  These bus addresses determine the caching behavior in the
> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
> 2 bits.  The bits in the bus address mean:
>
>  From the VideoCore processor:
> 0x0... L1 and L2 cache allocating and coherent
> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>
>  From the GPU peripherals (note: all peripherals bypass the L1
> cache. The ARM will see this view once through the VC MMU):
> 0x0... Do not use
> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>
> The 2835 firmware always configures the MMU to turn ARM physical
> addresses with 0x0 top bits to 0x4, meaning present in L2 but
> incoherent with L1.  However, any bus addresses we were generating in
> the kernel to be passed to a device had 0x0 bits.  That would be a
> reserved (possibly totally incoherent) value if sent to a GPU
> peripheral like USB, or L1 allocating if sent to the VC (like a
> firmware property request).  By setting dma-ranges, all of the devices
> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
> friends return addresses with 0x4 bits and avoid cache incoherency.
>
> This matches the behavior in the downstream 2708 kernel (see
> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>
> Signed-off-by: Eric Anholt <eric@anholt.net>
> Cc: popcornmix at gmail.com
> ---
>   arch/arm/boot/dts/bcm2835.dtsi | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
> index 5734650..2df1b5c 100644
> --- a/arch/arm/boot/dts/bcm2835.dtsi
> +++ b/arch/arm/boot/dts/bcm2835.dtsi
> @@ -15,6 +15,7 @@
>   		#address-cells = <1>;
>   		#size-cells = <1>;
>   		ranges = <0x7e000000 0x20000000 0x02000000>;
> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>   
>   		timer at 7e003000 {
>   			compatible = "brcm,bcm2835-system-timer";

This was quite a coincidence. I discovered the need for 'dma-ranges'
yesterday while trying to get the downstream bcm2708_fb driver to
work with ARCH_BCM2835. The driver is using the mailbox to get info
about the framebuffer from the firmware. When it failed I discovered
that the bus address was wrong.

What I don't understand, is that mmc and spi works fine with a "wrong"
bus address. It's only the framebuffer driver and the vchiq driver
when using mailbox that fails.

Tested-by: Noralf Tr?nnes <noralf@tronnes.org>


Regards,
Noralf Tr?nnes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-04 20:25 ` Noralf Trønnes
@ 2015-05-05  0:07   ` Eric Anholt
  2015-05-05 13:33     ` Noralf Trønnes
  2015-05-05 19:31   ` Stephen Warren
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Anholt @ 2015-05-05  0:07 UTC (permalink / raw)
  To: linux-arm-kernel

Noralf Tr?nnes <noralf@tronnes.org> writes:

> Den 04.05.2015 21:33, skrev Eric Anholt:
>> There exists a tiny MMU, configurable only by the VC (running the
>> closed firmware), which maps from the ARM's physical addresses to bus
>> addresses.  These bus addresses determine the caching behavior in the
>> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
>> 2 bits.  The bits in the bus address mean:
>>
>>  From the VideoCore processor:
>> 0x0... L1 and L2 cache allocating and coherent
>> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
>> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>
>>  From the GPU peripherals (note: all peripherals bypass the L1
>> cache. The ARM will see this view once through the VC MMU):
>> 0x0... Do not use
>> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
>> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>
>> The 2835 firmware always configures the MMU to turn ARM physical
>> addresses with 0x0 top bits to 0x4, meaning present in L2 but
>> incoherent with L1.  However, any bus addresses we were generating in
>> the kernel to be passed to a device had 0x0 bits.  That would be a
>> reserved (possibly totally incoherent) value if sent to a GPU
>> peripheral like USB, or L1 allocating if sent to the VC (like a
>> firmware property request).  By setting dma-ranges, all of the devices
>> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
>> friends return addresses with 0x4 bits and avoid cache incoherency.
>>
>> This matches the behavior in the downstream 2708 kernel (see
>> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>>
>> Signed-off-by: Eric Anholt <eric@anholt.net>
>> Cc: popcornmix at gmail.com
>> ---
>>   arch/arm/boot/dts/bcm2835.dtsi | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
>> index 5734650..2df1b5c 100644
>> --- a/arch/arm/boot/dts/bcm2835.dtsi
>> +++ b/arch/arm/boot/dts/bcm2835.dtsi
>> @@ -15,6 +15,7 @@
>>   		#address-cells = <1>;
>>   		#size-cells = <1>;
>>   		ranges = <0x7e000000 0x20000000 0x02000000>;
>> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>>   
>>   		timer at 7e003000 {
>>   			compatible = "brcm,bcm2835-system-timer";
>
> This was quite a coincidence. I discovered the need for 'dma-ranges'
> yesterday while trying to get the downstream bcm2708_fb driver to
> work with ARCH_BCM2835. The driver is using the mailbox to get info
> about the framebuffer from the firmware. When it failed I discovered
> that the bus address was wrong.
>
> What I don't understand, is that mmc and spi works fine with a "wrong"
> bus address. It's only the framebuffer driver and the vchiq driver
> when using mailbox that fails.
>
> Tested-by: Noralf Tr?nnes <noralf@tronnes.org>

Yeah, it was the mailbox driver I've been trying to merge, on pi2, that
made me get this patch together.  I'm suspicious that 0x0 works the same
as 0x4 for GPU peripherals (mmc, spi, vc4) on pi1, though I've had
occasional instability (something like 3 events per ~5000 tests) that I
sure hope is due to this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150504/2db073e0/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-05  0:07   ` Eric Anholt
@ 2015-05-05 13:33     ` Noralf Trønnes
  0 siblings, 0 replies; 11+ messages in thread
From: Noralf Trønnes @ 2015-05-05 13:33 UTC (permalink / raw)
  To: linux-arm-kernel

Den 05.05.2015 02:07, skrev Eric Anholt:
> Noralf Tr?nnes <noralf@tronnes.org> writes:
>
>> Den 04.05.2015 21:33, skrev Eric Anholt:
>>> There exists a tiny MMU, configurable only by the VC (running the
>>> closed firmware), which maps from the ARM's physical addresses to bus
>>> addresses.  These bus addresses determine the caching behavior in the
>>> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
>>> 2 bits.  The bits in the bus address mean:
>>>
>>>   From the VideoCore processor:
>>> 0x0... L1 and L2 cache allocating and coherent
>>> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
>>> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
>>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>>
>>>   From the GPU peripherals (note: all peripherals bypass the L1
>>> cache. The ARM will see this view once through the VC MMU):
>>> 0x0... Do not use
>>> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
>>> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
>>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>>
>>> The 2835 firmware always configures the MMU to turn ARM physical
>>> addresses with 0x0 top bits to 0x4, meaning present in L2 but
>>> incoherent with L1.  However, any bus addresses we were generating in
>>> the kernel to be passed to a device had 0x0 bits.  That would be a
>>> reserved (possibly totally incoherent) value if sent to a GPU
>>> peripheral like USB, or L1 allocating if sent to the VC (like a
>>> firmware property request).  By setting dma-ranges, all of the devices
>>> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
>>> friends return addresses with 0x4 bits and avoid cache incoherency.
>>>
>>> This matches the behavior in the downstream 2708 kernel (see
>>> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>>>
>>> Signed-off-by: Eric Anholt <eric@anholt.net>
>>> Cc: popcornmix at gmail.com
>>> ---
>>>    arch/arm/boot/dts/bcm2835.dtsi | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
>>> index 5734650..2df1b5c 100644
>>> --- a/arch/arm/boot/dts/bcm2835.dtsi
>>> +++ b/arch/arm/boot/dts/bcm2835.dtsi
>>> @@ -15,6 +15,7 @@
>>>    		#address-cells = <1>;
>>>    		#size-cells = <1>;
>>>    		ranges = <0x7e000000 0x20000000 0x02000000>;
>>> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>>>    
>>>    		timer at 7e003000 {
>>>    			compatible = "brcm,bcm2835-system-timer";
>> This was quite a coincidence. I discovered the need for 'dma-ranges'
>> yesterday while trying to get the downstream bcm2708_fb driver to
>> work with ARCH_BCM2835. The driver is using the mailbox to get info
>> about the framebuffer from the firmware. When it failed I discovered
>> that the bus address was wrong.
>>
>> What I don't understand, is that mmc and spi works fine with a "wrong"
>> bus address. It's only the framebuffer driver and the vchiq driver
>> when using mailbox that fails.
>>
>> Tested-by: Noralf Tr?nnes <noralf@tronnes.org>
> Yeah, it was the mailbox driver I've been trying to merge, on pi2, that
> made me get this patch together.  I'm suspicious that 0x0 works the same
> as 0x4 for GPU peripherals (mmc, spi, vc4) on pi1, though I've had
> occasional instability (something like 3 events per ~5000 tests) that I
> sure hope is due to this.

Since you mention Pi2:
Dom Cobley made me aware that 0xC is used on MACH_BCM2709.
The macros in arch/arm/mach-bcm270X/include/mach/memory.h are identical,
but arch/arm/mach-bcm2709/Kconfig has BCM2708_NOL2CACHE as default (as 
opposed to 2708/Kconfig).
This changes the _REAL_BUS_OFFSET macro:

#ifdef CONFIG_BCM2708_NOL2CACHE
  #define _REAL_BUS_OFFSET UL(0xC0000000)   /* don't use L1 or L2 caches */
#else
  #define _REAL_BUS_OFFSET UL(0x40000000)   /* use L2 cache */
#endif

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-04 19:33 [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM Eric Anholt
  2015-05-04 20:25 ` Noralf Trønnes
@ 2015-05-05 19:29 ` Stephen Warren
  2015-05-05 19:53   ` Eric Anholt
  2015-05-05 20:10 ` [PATCH v2] " Eric Anholt
  2 siblings, 1 reply; 11+ messages in thread
From: Stephen Warren @ 2015-05-05 19:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/04/2015 01:33 PM, Eric Anholt wrote:
> There exists a tiny MMU, configurable only by the VC (running the
> closed firmware), which maps from the ARM's physical addresses to bus
> addresses.  These bus addresses determine the caching behavior in the
> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
> 2 bits.  The bits in the bus address mean:
>
>  From the VideoCore processor:
> 0x0... L1 and L2 cache allocating and coherent
> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>
>  From the GPU peripherals (note: all peripherals bypass the L1
> cache. The ARM will see this view once through the VC MMU):
> 0x0... Do not use
> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>
> The 2835 firmware always configures the MMU to turn ARM physical
> addresses with 0x0 top bits to 0x4, meaning present in L2 but
> incoherent with L1.  However, any bus addresses we were generating in
> the kernel to be passed to a device had 0x0 bits.  That would be a
> reserved (possibly totally incoherent) value if sent to a GPU
> peripheral like USB, or L1 allocating if sent to the VC (like a
> firmware property request).  By setting dma-ranges, all of the devices
> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
> friends return addresses with 0x4 bits and avoid cache incoherency.
>
> This matches the behavior in the downstream 2708 kernel (see
> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).

> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi

>   		#address-cells = <1>;
>   		#size-cells = <1>;
>   		ranges = <0x7e000000 0x20000000 0x02000000>;
> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;

Oh well that's a nice and simple patch; I had been avoiding looking into 
fixing the kernel for this since I was worried it'd be rather complex!

I'm puzzled why the length cell of ranges and dma-ranges differs though? 
Assuming there's a good explanation for that,

Acked-by: Stephen Warren <swarren@wwwdotorg.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-04 20:25 ` Noralf Trønnes
  2015-05-05  0:07   ` Eric Anholt
@ 2015-05-05 19:31   ` Stephen Warren
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen Warren @ 2015-05-05 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/04/2015 02:25 PM, Noralf Tr?nnes wrote:
>
> Den 04.05.2015 21:33, skrev Eric Anholt:
>> There exists a tiny MMU, configurable only by the VC (running the
>> closed firmware), which maps from the ARM's physical addresses to bus
>> addresses.  These bus addresses determine the caching behavior in the
>> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
>> 2 bits.  The bits in the bus address mean:
>>
>>  From the VideoCore processor:
>> 0x0... L1 and L2 cache allocating and coherent
>> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
>> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or
>> coherent
>>
>>  From the GPU peripherals (note: all peripherals bypass the L1
>> cache. The ARM will see this view once through the VC MMU):
>> 0x0... Do not use
>> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
>> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or
>> coherent
>>
>> The 2835 firmware always configures the MMU to turn ARM physical
>> addresses with 0x0 top bits to 0x4, meaning present in L2 but
>> incoherent with L1.  However, any bus addresses we were generating in
>> the kernel to be passed to a device had 0x0 bits.  That would be a
>> reserved (possibly totally incoherent) value if sent to a GPU
>> peripheral like USB, or L1 allocating if sent to the VC (like a
>> firmware property request).  By setting dma-ranges, all of the devices
>> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
>> friends return addresses with 0x4 bits and avoid cache incoherency.
>>
>> This matches the behavior in the downstream 2708 kernel (see
>> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>>
>> Signed-off-by: Eric Anholt <eric@anholt.net>
>> Cc: popcornmix at gmail.com
>> ---
>>   arch/arm/boot/dts/bcm2835.dtsi | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm/boot/dts/bcm2835.dtsi
>> b/arch/arm/boot/dts/bcm2835.dtsi
>> index 5734650..2df1b5c 100644
>> --- a/arch/arm/boot/dts/bcm2835.dtsi
>> +++ b/arch/arm/boot/dts/bcm2835.dtsi
>> @@ -15,6 +15,7 @@
>>           #address-cells = <1>;
>>           #size-cells = <1>;
>>           ranges = <0x7e000000 0x20000000 0x02000000>;
>> +        dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>>           timer at 7e003000 {
>>               compatible = "brcm,bcm2835-system-timer";
>
> This was quite a coincidence. I discovered the need for 'dma-ranges'
> yesterday while trying to get the downstream bcm2708_fb driver to
> work with ARCH_BCM2835. The driver is using the mailbox to get info
> about the framebuffer from the firmware. When it failed I discovered
> that the bus address was wrong.
>
> What I don't understand, is that mmc and spi works fine with a "wrong"
> bus address. It's only the framebuffer driver and the vchiq driver
> when using mailbox that fails.

It's possible this is just a fluke. After all, having the wrong value 
for the upper 2 bits of DMA-mastered accesses will only have any affect 
if there's a live entry in the cache for that address. Of course as Eric 
says, perhaps different peripherals treat the invalid 0 value 
differently too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-05 19:29 ` Stephen Warren
@ 2015-05-05 19:53   ` Eric Anholt
  2015-05-13  8:51     ` Lee Jones
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Anholt @ 2015-05-05 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

Stephen Warren <swarren@wwwdotorg.org> writes:

> On 05/04/2015 01:33 PM, Eric Anholt wrote:
>> There exists a tiny MMU, configurable only by the VC (running the
>> closed firmware), which maps from the ARM's physical addresses to bus
>> addresses.  These bus addresses determine the caching behavior in the
>> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
>> 2 bits.  The bits in the bus address mean:
>>
>>  From the VideoCore processor:
>> 0x0... L1 and L2 cache allocating and coherent
>> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
>> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>
>>  From the GPU peripherals (note: all peripherals bypass the L1
>> cache. The ARM will see this view once through the VC MMU):
>> 0x0... Do not use
>> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
>> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
>> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>>
>> The 2835 firmware always configures the MMU to turn ARM physical
>> addresses with 0x0 top bits to 0x4, meaning present in L2 but
>> incoherent with L1.  However, any bus addresses we were generating in
>> the kernel to be passed to a device had 0x0 bits.  That would be a
>> reserved (possibly totally incoherent) value if sent to a GPU
>> peripheral like USB, or L1 allocating if sent to the VC (like a
>> firmware property request).  By setting dma-ranges, all of the devices
>> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
>> friends return addresses with 0x4 bits and avoid cache incoherency.
>>
>> This matches the behavior in the downstream 2708 kernel (see
>> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>
>> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
>
>>   		#address-cells = <1>;
>>   		#size-cells = <1>;
>>   		ranges = <0x7e000000 0x20000000 0x02000000>;
>> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>
> Oh well that's a nice and simple patch; I had been avoiding looking into 
> fixing the kernel for this since I was worried it'd be rather complex!
>
> I'm puzzled why the length cell of ranges and dma-ranges differs though? 
> Assuming there's a good explanation for that,

Nope, you're right, it should be 0x20000000.  '0x1f' came from going
back from the '0x3f' on the pi2, but pi2 just has a chunk lost to the
bus mapping.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150505/67d252b1/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-04 19:33 [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM Eric Anholt
  2015-05-04 20:25 ` Noralf Trønnes
  2015-05-05 19:29 ` Stephen Warren
@ 2015-05-05 20:10 ` Eric Anholt
  2015-05-14  8:43   ` Lee Jones
  2 siblings, 1 reply; 11+ messages in thread
From: Eric Anholt @ 2015-05-05 20:10 UTC (permalink / raw)
  To: linux-arm-kernel

There exists a tiny MMU, configurable only by the VC (running the
closed firmware), which maps from the ARM's physical addresses to bus
addresses.  These bus addresses determine the caching behavior in the
VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
2 bits.  The bits in the bus address mean:

>From the VideoCore processor:
0x0... L1 and L2 cache allocating and coherent
0x4... L1 non-allocating, but coherent. L2 allocating and coherent
0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

>From the GPU peripherals (note: all peripherals bypass the L1
cache. The ARM will see this view once through the VC MMU):
0x0... Do not use
0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

The 2835 firmware always configures the MMU to turn ARM physical
addresses with 0x0 top bits to 0x4, meaning present in L2 but
incoherent with L1.  However, any bus addresses we were generating in
the kernel to be passed to a device had 0x0 bits.  That would be a
reserved (possibly totally incoherent) value if sent to a GPU
peripheral like USB, or L1 allocating if sent to the VC (like a
firmware property request).  By setting dma-ranges, all of the devices
below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
friends return addresses with 0x4 bits and avoid cache incoherency.

This matches the behavior in the downstream 2708 kernel (see
BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Noralf Tr?nnes <noralf@tronnes.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: popcornmix at gmail.com
---

v2: Fix length of the range from 0x1f000000 to 0x20000000, fixing the
    translation for the last 16MB.

 arch/arm/boot/dts/bcm2835.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
index eb33a8c..3c899b3 100644
--- a/arch/arm/boot/dts/bcm2835.dtsi
+++ b/arch/arm/boot/dts/bcm2835.dtsi
@@ -15,6 +15,7 @@
 		#address-cells = <1>;
 		#size-cells = <1>;
 		ranges = <0x7e000000 0x20000000 0x02000000>;
+		dma-ranges = <0x40000000 0x00000000 0x20000000>;
 
 		timer at 7e003000 {
 			compatible = "brcm,bcm2835-system-timer";
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-05 19:53   ` Eric Anholt
@ 2015-05-13  8:51     ` Lee Jones
  2015-05-13 17:41       ` Eric Anholt
  0 siblings, 1 reply; 11+ messages in thread
From: Lee Jones @ 2015-05-13  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 05 May 2015, Eric Anholt wrote:

> Stephen Warren <swarren@wwwdotorg.org> writes:
> 
> > On 05/04/2015 01:33 PM, Eric Anholt wrote:
> >> There exists a tiny MMU, configurable only by the VC (running the
> >> closed firmware), which maps from the ARM's physical addresses to bus
> >> addresses.  These bus addresses determine the caching behavior in the
> >> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
> >> 2 bits.  The bits in the bus address mean:
> >>
> >>  From the VideoCore processor:
> >> 0x0... L1 and L2 cache allocating and coherent
> >> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
> >> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
> >> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
> >>
> >>  From the GPU peripherals (note: all peripherals bypass the L1
> >> cache. The ARM will see this view once through the VC MMU):
> >> 0x0... Do not use
> >> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
> >> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
> >> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
> >>
> >> The 2835 firmware always configures the MMU to turn ARM physical
> >> addresses with 0x0 top bits to 0x4, meaning present in L2 but
> >> incoherent with L1.  However, any bus addresses we were generating in
> >> the kernel to be passed to a device had 0x0 bits.  That would be a
> >> reserved (possibly totally incoherent) value if sent to a GPU
> >> peripheral like USB, or L1 allocating if sent to the VC (like a
> >> firmware property request).  By setting dma-ranges, all of the devices
> >> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
> >> friends return addresses with 0x4 bits and avoid cache incoherency.
> >>
> >> This matches the behavior in the downstream 2708 kernel (see
> >> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
> >
> >> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
> >
> >>   		#address-cells = <1>;
> >>   		#size-cells = <1>;
> >>   		ranges = <0x7e000000 0x20000000 0x02000000>;
> >> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
> >
> > Oh well that's a nice and simple patch; I had been avoiding looking into 
> > fixing the kernel for this since I was worried it'd be rather complex!
> >
> > I'm puzzled why the length cell of ranges and dma-ranges differs though? 
> > Assuming there's a good explanation for that,
> 
> Nope, you're right, it should be 0x20000000.  '0x1f' came from going
> back from the '0x3f' on the pi2, but pi2 just has a chunk lost to the
> bus mapping.

So are you going to fix this and send another patch?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-13  8:51     ` Lee Jones
@ 2015-05-13 17:41       ` Eric Anholt
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Anholt @ 2015-05-13 17:41 UTC (permalink / raw)
  To: linux-arm-kernel

Lee Jones <lee@kernel.org> writes:

> On Tue, 05 May 2015, Eric Anholt wrote:
>
>> Stephen Warren <swarren@wwwdotorg.org> writes:
>> 
>> > On 05/04/2015 01:33 PM, Eric Anholt wrote:
>> >> There exists a tiny MMU, configurable only by the VC (running the
>> >> closed firmware), which maps from the ARM's physical addresses to bus
>> >> addresses.  These bus addresses determine the caching behavior in the
>> >> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
>> >> 2 bits.  The bits in the bus address mean:
>> >>
>> >>  From the VideoCore processor:
>> >> 0x0... L1 and L2 cache allocating and coherent
>> >> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
>> >> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
>> >> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>> >>
>> >>  From the GPU peripherals (note: all peripherals bypass the L1
>> >> cache. The ARM will see this view once through the VC MMU):
>> >> 0x0... Do not use
>> >> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
>> >> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
>> >> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
>> >>
>> >> The 2835 firmware always configures the MMU to turn ARM physical
>> >> addresses with 0x0 top bits to 0x4, meaning present in L2 but
>> >> incoherent with L1.  However, any bus addresses we were generating in
>> >> the kernel to be passed to a device had 0x0 bits.  That would be a
>> >> reserved (possibly totally incoherent) value if sent to a GPU
>> >> peripheral like USB, or L1 allocating if sent to the VC (like a
>> >> firmware property request).  By setting dma-ranges, all of the devices
>> >> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
>> >> friends return addresses with 0x4 bits and avoid cache incoherency.
>> >>
>> >> This matches the behavior in the downstream 2708 kernel (see
>> >> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
>> >
>> >> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
>> >
>> >>   		#address-cells = <1>;
>> >>   		#size-cells = <1>;
>> >>   		ranges = <0x7e000000 0x20000000 0x02000000>;
>> >> +		dma-ranges = <0x40000000 0x00000000 0x1f000000>;
>> >
>> > Oh well that's a nice and simple patch; I had been avoiding looking into 
>> > fixing the kernel for this since I was worried it'd be rather complex!
>> >
>> > I'm puzzled why the length cell of ranges and dma-ranges differs though? 
>> > Assuming there's a good explanation for that,
>> 
>> Nope, you're right, it should be 0x20000000.  '0x1f' came from going
>> back from the '0x3f' on the pi2, but pi2 just has a chunk lost to the
>> bus mapping.
>
> So are you going to fix this and send another patch?

I see it having hit the list:

http://lists.infradead.org/pipermail/linux-rpi-kernel/2015-May/001699.html

but I'm missing both versions in my inbox, so I'm not sure what
happened.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150513/1cfb7877/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM.
  2015-05-05 20:10 ` [PATCH v2] " Eric Anholt
@ 2015-05-14  8:43   ` Lee Jones
  0 siblings, 0 replies; 11+ messages in thread
From: Lee Jones @ 2015-05-14  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 05 May 2015, Eric Anholt wrote:

> There exists a tiny MMU, configurable only by the VC (running the
> closed firmware), which maps from the ARM's physical addresses to bus
> addresses.  These bus addresses determine the caching behavior in the
> VC's L1/L2 (note: separate from the ARM's L1/L2) according to the top
> 2 bits.  The bits in the bus address mean:
> 
> From the VideoCore processor:
> 0x0... L1 and L2 cache allocating and coherent
> 0x4... L1 non-allocating, but coherent. L2 allocating and coherent
> 0x8... L1 non-allocating, but coherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
> 
> From the GPU peripherals (note: all peripherals bypass the L1
> cache. The ARM will see this view once through the VC MMU):
> 0x0... Do not use
> 0x4... L1 non-allocating, and incoherent. L2 allocating and coherent.
> 0x8... L1 non-allocating, and incoherent. L2 non-allocating, but coherent
> 0xc... SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent
> 
> The 2835 firmware always configures the MMU to turn ARM physical
> addresses with 0x0 top bits to 0x4, meaning present in L2 but
> incoherent with L1.  However, any bus addresses we were generating in
> the kernel to be passed to a device had 0x0 bits.  That would be a
> reserved (possibly totally incoherent) value if sent to a GPU
> peripheral like USB, or L1 allocating if sent to the VC (like a
> firmware property request).  By setting dma-ranges, all of the devices
> below it get a dev->dma_pfn_offset, so that dma_alloc_coherent() and
> friends return addresses with 0x4 bits and avoid cache incoherency.
> 
> This matches the behavior in the downstream 2708 kernel (see
> BUS_OFFSET in arch/arm/mach-bcm2708/include/mach/memory.h).
> 
> Signed-off-by: Eric Anholt <eric@anholt.net>
> Tested-by: Noralf Tr?nnes <noralf@tronnes.org>
> Acked-by: Stephen Warren <swarren@wwwdotorg.org>
> Cc: popcornmix at gmail.com

Applied, thanks.

> ---
> 
> v2: Fix length of the range from 0x1f000000 to 0x20000000, fixing the
>     translation for the last 16MB.
> 
>  arch/arm/boot/dts/bcm2835.dtsi | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/boot/dts/bcm2835.dtsi b/arch/arm/boot/dts/bcm2835.dtsi
> index eb33a8c..3c899b3 100644
> --- a/arch/arm/boot/dts/bcm2835.dtsi
> +++ b/arch/arm/boot/dts/bcm2835.dtsi
> @@ -15,6 +15,7 @@
>  		#address-cells = <1>;
>  		#size-cells = <1>;
>  		ranges = <0x7e000000 0x20000000 0x02000000>;
> +		dma-ranges = <0x40000000 0x00000000 0x20000000>;
>  
>  		timer at 7e003000 {
>  			compatible = "brcm,bcm2835-system-timer";

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-14  8:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-04 19:33 [PATCH] ARM: bcm2835: Use 0x4 prefix for DMA bus addresses to SDRAM Eric Anholt
2015-05-04 20:25 ` Noralf Trønnes
2015-05-05  0:07   ` Eric Anholt
2015-05-05 13:33     ` Noralf Trønnes
2015-05-05 19:31   ` Stephen Warren
2015-05-05 19:29 ` Stephen Warren
2015-05-05 19:53   ` Eric Anholt
2015-05-13  8:51     ` Lee Jones
2015-05-13 17:41       ` Eric Anholt
2015-05-05 20:10 ` [PATCH v2] " Eric Anholt
2015-05-14  8:43   ` Lee Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).