* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
@ 2011-03-25 10:16 Catalin Marinas
2011-03-29 15:02 ` Martin Furmanski
0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2011-03-25 10:16 UTC (permalink / raw)
To: linux-arm-kernel
Since mandatory barriers may be used (explicitly or implicitly via readl
etc.) to ensure the ordering between Device and Normal memory accesses,
a DMB is not enough. This patch converts it to a DSB.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/system.h | 2 +-
arch/arm/mach-realview/include/mach/barriers.h | 2 +-
arch/arm/mach-tegra/include/mach/barriers.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index 9a87823..926d867 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -159,7 +159,7 @@ extern unsigned int user_debug;
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() mb()
#else
#include <asm/memory.h>
diff --git a/arch/arm/mach-realview/include/mach/barriers.h b/arch/arm/mach-realview/include/mach/barriers.h
index 0c5d749..9a73219 100644
--- a/arch/arm/mach-realview/include/mach/barriers.h
+++ b/arch/arm/mach-realview/include/mach/barriers.h
@@ -4,5 +4,5 @@
* operation to deadlock the system.
*/
#define mb() dsb()
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() mb()
diff --git a/arch/arm/mach-tegra/include/mach/barriers.h b/arch/arm/mach-tegra/include/mach/barriers.h
index cc11517..425b42e 100644
--- a/arch/arm/mach-tegra/include/mach/barriers.h
+++ b/arch/arm/mach-tegra/include/mach/barriers.h
@@ -23,7 +23,7 @@
#include <asm/outercache.h>
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() do { dsb(); outer_sync(); } while (0)
#define mb() wmb()
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-03-25 10:16 [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses Catalin Marinas
@ 2011-03-29 15:02 ` Martin Furmanski
2011-03-29 15:21 ` Catalin Marinas
0 siblings, 1 reply; 12+ messages in thread
From: Martin Furmanski @ 2011-03-29 15:02 UTC (permalink / raw)
To: linux-arm-kernel
Do you have a reference on this?
I have been under the impression that DMB is a barrier for all memory
accesses. I find no support in ARMv7, for the hypothesis that DSB is needed
to order between Device and Normal.
Regards,
Martin Furmanski
On Fri, Mar 25, 2011 at 11:16 AM, Catalin Marinas
<catalin.marinas@arm.com>wrote:
> Since mandatory barriers may be used (explicitly or implicitly via readl
> etc.) to ensure the ordering between Device and Normal memory accesses,
> a DMB is not enough. This patch converts it to a DSB.
>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Colin Cross <ccross@android.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
> arch/arm/include/asm/system.h | 2 +-
> arch/arm/mach-realview/include/mach/barriers.h | 2 +-
> arch/arm/mach-tegra/include/mach/barriers.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
> index 9a87823..926d867 100644
> --- a/arch/arm/include/asm/system.h
> +++ b/arch/arm/include/asm/system.h
> @@ -159,7 +159,7 @@ extern unsigned int user_debug;
> #include <mach/barriers.h>
> #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
> #define mb() do { dsb(); outer_sync(); } while (0)
> -#define rmb() dmb()
> +#define rmb() dsb()
> #define wmb() mb()
> #else
> #include <asm/memory.h>
> diff --git a/arch/arm/mach-realview/include/mach/barriers.h
> b/arch/arm/mach-realview/include/mach/barriers.h
> index 0c5d749..9a73219 100644
> --- a/arch/arm/mach-realview/include/mach/barriers.h
> +++ b/arch/arm/mach-realview/include/mach/barriers.h
> @@ -4,5 +4,5 @@
> * operation to deadlock the system.
> */
> #define mb() dsb()
> -#define rmb() dmb()
> +#define rmb() dsb()
> #define wmb() mb()
> diff --git a/arch/arm/mach-tegra/include/mach/barriers.h
> b/arch/arm/mach-tegra/include/mach/barriers.h
> index cc11517..425b42e 100644
> --- a/arch/arm/mach-tegra/include/mach/barriers.h
> +++ b/arch/arm/mach-tegra/include/mach/barriers.h
> @@ -23,7 +23,7 @@
>
> #include <asm/outercache.h>
>
> -#define rmb() dmb()
> +#define rmb() dsb()
> #define wmb() do { dsb(); outer_sync(); } while (0)
> #define mb() wmb()
>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110329/af7c1441/attachment-0001.html>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-03-29 15:02 ` Martin Furmanski
@ 2011-03-29 15:21 ` Catalin Marinas
[not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com>
2011-04-07 7:07 ` Ming Lei
0 siblings, 2 replies; 12+ messages in thread
From: Catalin Marinas @ 2011-03-29 15:21 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote:
> Do you have a reference on this?
Usually the ARM ARM but a document with examples is this:
http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf
(infocenter.arm.com -> Developer Guides and Articles -> ARM architecture
-> Barrier Litmus and Tests Cookbook)
> I have been under the impression that DMB is a barrier for all memory
> accesses. I find no support in ARMv7, for the hypothesis that DSB is
> needed to order between Device and Normal.
The key point is that DMB only ensures the *observability* of memory
accesses by the processors and not arrival to the device or block of
memory. In the drawing below, DMB only ensures the ordering at point
(1). For arrival to RAM or Device you need a DSB.
+------+ +------+
| P1 | | P2 |
+------+ +------+
| (1) |
+------+------+
|
+-------------+-------------+
| | |
+------+ +------+ +--------+
| RAM1 | | RAM2 | | Device |
+------+ +------+ +--------+
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
[not found] ` <1301416642.583.101.camel@e102109-lin.cambridge.arm.com>
@ 2011-03-29 21:06 ` Martin Furmanski
2011-03-30 8:53 ` Catalin Marinas
0 siblings, 1 reply; 12+ messages in thread
From: Martin Furmanski @ 2011-03-29 21:06 UTC (permalink / raw)
To: linux-arm-kernel
These statements appear not even remotely valid. Certainly the CPU is only a
bus master, so when you say slave/master memory access, what are you
referring to?
(Sorry for accidentally bringing the thread off the list for the last 2
posts. Bringing it back.)
Best Regards,
Martin Furmanski
On Tue, Mar 29, 2011 at 6:37 PM, Catalin Marinas <catalin.marinas@arm.com>wrote:
> On Tue, 2011-03-29 at 17:16 +0100, Martin Furmanski wrote:
> > From memory_barriers.txt, regarding rmb():
> >
> > " (3) Read (or load) memory barriers.
> >
> > A read barrier is a data dependency barrier plus a guarantee that
> all the
> > LOAD operations specified before the barrier will appear to happen
> before
> > all the LOAD operations specified after the barrier with respect to
> the
> > other components of the system."
> >
> > If above is the semantics that you are aiming for, then it is still
> irrelevant whether
> > the memory is Normal or Device. But other than that, I see no issues.
>
> The relevant part is that a Device access is usually a *slave* access
> while Normal memory access is a *master* one. The DMB only ensures the
> latter (as specified by the ARM ARM, though probably not very clear).
>
> I'm not sure whether the memory-barriers.txt doc covers all the usage in
> the kernel but the mandatory barriers (mb/rmb/wmb) may be used to
> control the effect of I/O accesses (as per the above document). On ARM,
> this needs to be DSB rather than DMB.
>
> I can even give examples where a DMB would fail with regards to Device
> memory.
>
> --
> Catalin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110329/06e8ffc3/attachment-0001.html>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-03-29 21:06 ` Martin Furmanski
@ 2011-03-30 8:53 ` Catalin Marinas
0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2011-03-30 8:53 UTC (permalink / raw)
To: linux-arm-kernel
(could you please avoid top-posting, it makes it harder to follow?)
On Tue, 2011-03-29 at 22:06 +0100, Martin Furmanski wrote:
> These statements appear not even remotely valid. Certainly the CPU is
> only a bus master, so when you say slave/master memory access, what
> are you referring to?
What I meant by master is memory access that the CPU initiates. By slave
I was referring to accesses that the device or block of memory receive
as a result of a bus master request. You can rephrase this if you want.
On the DMB - it only ensures that the bus master requests are ordered,
it doesn't care about the request arrival at a device or block of
memory.
> (Sorry for accidentally bringing the thread off the list for the last
> 2 posts. Bringing it back.)
It looks like your messages are held in a moderation queue (you may want
to subscribe or wait for the moderator to approve them).
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-03-29 15:21 ` Catalin Marinas
[not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com>
@ 2011-04-07 7:07 ` Ming Lei
2011-04-09 8:57 ` Catalin Marinas
1 sibling, 1 reply; 12+ messages in thread
From: Ming Lei @ 2011-04-07 7:07 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
2011/3/29 Catalin Marinas <catalin.marinas@arm.com>:
> On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote:
>> Do you have a reference on this?
>
> Usually the ARM ARM but a document with examples is this:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf
After glancing over the above, I find DSB is only applied in the WFE/WFI
example in section 6, but all other examples in this section do use DMB.
So could you point out which example is the reference for the mandatory DSB
wrt. read memory barrier?
> (infocenter.arm.com -> Developer Guides and Articles -> ARM architecture
> -> Barrier Litmus and Tests Cookbook)
>
>> I have been under the impression that DMB is a barrier for all memory
>> accesses. I find no support in ARMv7, for the hypothesis that DSB is
>> needed to order between Device and Normal.
>
> The key point is that DMB only ensures the *observability* of memory
> accesses by the processors and not arrival to the device or block of
How could you conclude that the memory accesses order is different with
the order of memory requests observed on the same type of memory?
> memory. In the drawing below, DMB only ensures the ordering at point
> (1). For arrival to RAM or Device you need a DSB.
>
> ? ? ? +------+ ? ? +------+
> ? ? ? | ?P1 ?| ? ? | ?P2 ?|
> ? ? ? +------+ ? ? +------+
> ? ? ? ? ?| ? ? (1) ? ? |
> ? ? ? ? ?+------+------+
> ? ? ? ? ? ? ? ? |
> ? +-------------+-------------+
> ? | ? ? ? ? ? ? | ? ? ? ? ? ? |
> +------+ ? ? +------+ ? ? +--------+
> | RAM1 | ? ? | RAM2 | ? ? | Device |
> +------+ ? ? +------+ ? ? +--------+
thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-07 7:07 ` Ming Lei
@ 2011-04-09 8:57 ` Catalin Marinas
2011-04-09 17:03 ` Ming Lei
0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2011-04-09 8:57 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
On 7 April 2011 10:07, Ming Lei <tom.leiming@gmail.com> wrote:
> 2011/3/29 Catalin Marinas <catalin.marinas@arm.com>:
>> On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote:
>>> Do you have a reference on this?
>>
>> Usually the ARM ARM but a document with examples is this:
>>
>> http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf
>
> After glancing over the above, I find DSB is only applied in the WFE/WFI
> example in section 6, but all other examples in this section do use DMB.
>
> So could you point out which example is the reference for the mandatory DSB
> wrt. read memory barrier?
Probably the above document isn't comprehensive enough. It mainly
targets memory ordering between processors. I think another example
that mentions DSB is the mailbox scenario.
Anyway, my patch is based on the discussions I had with the person
that wrote the above document (and the ARM ARM).
>>> I have been under the impression that DMB is a barrier for all memory
>>> accesses. I find no support in ARMv7, for the hypothesis that DSB is
>>> needed to order between Device and Normal.
>>
>> The key point is that DMB only ensures the *observability* of memory
>> accesses by the processors and not arrival to the device or block of
>
> How could you conclude that the memory accesses order is different with
> the order of memory requests observed on the same type of memory?
I don't fully understand your question. But I'll give an example where
the DMB fails.
Let's assume we have a device that performs the two steps below:
1. Writes data to RAM
2. Updates its status register
A driver running on the CPU has some code as below:
LDR [Device] @ read the device status
DMB @ current barrier that we have in readl
TST @ check whether the DMA transfer is ready
BEQ out
LDR [Normal] @ read the DMA buffer
...
out:
With the code above, the CPU may do the following steps:
1. Issue read from the device. Note that it does not wait for the read
to complete.
2. DMB - ensures that no subsequent memory accesses happen before the
previous ones.
3. Issues read from normal memory speculatively. This is allowed
because the TST/BEQ are only flow control dependency. In case the
condition fails, the read is discarded.
4. The read from Normal memory (DMA buffer) completes. This could
happen before the I/O read at point 1 depending on the bus speeds.
5. The Device read completes. This can happen after the Normal read
because of different bus speeds.
6. TST clears CPSR.Z
7. BEQ not executed.
8. Normal read data moved to register.
So, even if the CPU issues the read from Device and Normal memory in
order (steps 1, 3), they can happen at the device and RAM level out of
order (steps 4, 5) and the CPU could read data not yet written by the
device.
The solution is to use a DSB which ensures the completion of the
Device read before issuing the Normal memory read.
Note that if the device would update the ready state in some
memory-mapped register via the same port as the CPU, a DMB would be
enough (IOW, ordering ensured only for accesses initiated by bus
masters).
Hope this helps.
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-09 8:57 ` Catalin Marinas
@ 2011-04-09 17:03 ` Ming Lei
2011-04-10 10:10 ` Catalin Marinas
0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2011-04-09 17:03 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
2011/4/9 Catalin Marinas <catalin.marinas@arm.com>:
> Probably the above document isn't comprehensive enough. It mainly
> targets memory ordering between processors. I think another example
> that mentions DSB is the mailbox scenario.
I also saw this example of 8.1, but I don't think it is related with
linux read memory barrier, see below:
- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before
'LDR R5, [R1]' in P2;
- Documentation/memory-barriers.txt said:
1), Memory barriers are such interventions. They impose a
perceived partial ordering over the memory operations on either
side of the barrier.
2),There is no guarantee that any of the memory accesses specified
before a memory barrier will be _complete_ by the completion of a
memory barrier instruction; the barrier can be considered to draw a
line in that CPU's access queue that accesses of the appropriate type
may not cross.
In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead
of in P1 because of the belows:
- out of order of the two stores in P1 is allowable
- P2 should get message address from the mailbox pointed by R4 first,
so how can we make sure the 'STR R1, [R4]' in P1 is completed if
R4 doesn't point to a Strongly-Ordered memory.
> Anyway, my patch is based on the discussions I had with the person
> that wrote the above document (and the ARM ARM).
>
>>>> I have been under the impression that DMB is a barrier for all memory
>>>> accesses. I find no support in ARMv7, for the hypothesis that DSB is
>>>> needed to order between Device and Normal.
>>>
>>> The key point is that DMB only ensures the *observability* of memory
>>> accesses by the processors and not arrival to the device or block of
>>
>> How could you conclude that the memory accesses order is different with
>> the order of memory requests observed on the same type of memory?
>
> I don't fully understand your question. But I'll give an example where
> the DMB fails.
>
> Let's assume we have a device that performs the two steps below:
>
> 1. Writes data to RAM
> 2. Updates its status register
>
> A driver running on the CPU has some code as below:
>
> ? ?LDR [Device] ? ?@ read the device status
> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl
> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready
> ? ?BEQ out
> ? ?LDR [Normal] ? ?@ read the DMA buffer
> ? ?...
> out:
>
> With the code above, the CPU may do the following steps:
>
> 1. Issue read from the device. Note that it does not wait for the read
> to complete.
> 2. DMB - ensures that no subsequent memory accesses happen before the
> previous ones.
> 3. Issues read from normal memory speculatively. This is allowed
> because the TST/BEQ are only flow control dependency. In case the
> condition fails, the read is discarded.
> 4. The read from Normal memory (DMA buffer) completes. This could
> happen before the I/O read at point 1 depending on the bus speeds.
> 5. The Device read completes. This can happen after the Normal read
> because of different bus speeds.
> 6. TST clears CPSR.Z
> 7. BEQ not executed.
> 8. Normal read data moved to register.
>
> So, even if the CPU issues the read from Device and Normal memory in
> order (steps 1, 3), they can happen at the device and RAM level out of
> order (steps 4, 5) and the CPU could read data not yet written by the
> device.
>
> The solution is to use a DSB which ensures the completion of the
> Device read before issuing the Normal memory read.
I agree a DSB is needed in such case, but I am not sure the patch is
good. Maybe we should keep rmb not changed and only replace __iormb
as DSB if this issue only happens between device io memory and normal
memory, then rmb will not degrade performance a little as does by your
patch.
thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-09 17:03 ` Ming Lei
@ 2011-04-10 10:10 ` Catalin Marinas
2011-04-11 5:43 ` Ming Lei
0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2011-04-10 10:10 UTC (permalink / raw)
To: linux-arm-kernel
On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote:
> 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>:
>> Probably the above document isn't comprehensive enough. It mainly
>> targets memory ordering between processors. I think another example
>> that mentions DSB is the mailbox scenario.
>
> I also saw this example of 8.1, but I don't think it is related with
> linux read memory barrier, see below:
>
> ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before
> ? ? ? ?'LDR R5, [R1]' in P2;
Its not related to the read memory barrier but it refers to ordering
between accesses to Normal memory and Device memory. The mailbox here
at [R4] can be a device. I'm copying the relevant code here for others
following this thread:
P1:
STR R5, [R1] ; message stored to shared memory location
DSB [ST]
STR R1, [R4] ; R4 contains the address of a mailbox
P2:
; interrupt service routine
LDR R5, [R1]
> ? ? ? ?- Documentation/memory-barriers.txt said:
> ? ? ? ?1), Memory barriers are such interventions. ?They impose a
> ? ? ? ?perceived partial ordering over the memory operations on either
> ? ? ? ?side of the barrier.
>
> ? ? ? ?2),There is no guarantee that any of the memory accesses specified
> ? ? ? ?before a memory barrier will be _complete_ by the completion of a
> ? ? ? ?memory barrier instruction; the barrier can be considered to draw a
> ? ? ? ?line in that CPU's access queue that accesses of the appropriate type
> ? ? ? ?may not cross.
While the above is correct, the Linux document is quite vague in
relation to the mandatory barriers and it's not clear whether the
above only refers to the SMP barriers.
> In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead
> of in P1 because of the belows:
A DSB on one CPU does not control the ordering on another CPU. So the
DSB on P2 is useless. Note that the P2 code is executed in an
interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop
polling for [R4] to get a valid address, we would indeed need a DSB
before LDR R5, [R1].
> ? ? ? ?- out of order of the two stores in P1 is allowable
That's why we have a DSB in there, to ensure that the [R1] location is
written before we store the R1 to the [R4] device.
> ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first,
> ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if
> ? ? ? ?R4 doesn't point to a Strongly-Ordered memory.
Even if the memory is SO, on newer CPUs I don't think it guarantees
the completion (pretty much acting like Device memory). But here we
don't care when the store to [R4] completes. When this happen, P2 will
get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt
routine speculatively (i.e. before the interrupt was received).
>> Let's assume we have a device that performs the two steps below:
>>
>> 1. Writes data to RAM
>> 2. Updates its status register
>>
>> A driver running on the CPU has some code as below:
>>
>> ? ?LDR [Device] ? ?@ read the device status
>> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl
>> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready
>> ? ?BEQ out
>> ? ?LDR [Normal] ? ?@ read the DMA buffer
>> ? ?...
>> out:
>>
>> With the code above, the CPU may do the following steps:
>>
>> 1. Issue read from the device. Note that it does not wait for the read
>> to complete.
>> 2. DMB - ensures that no subsequent memory accesses happen before the
>> previous ones.
>> 3. Issues read from normal memory speculatively. This is allowed
>> because the TST/BEQ are only flow control dependency. In case the
>> condition fails, the read is discarded.
>> 4. The read from Normal memory (DMA buffer) completes. This could
>> happen before the I/O read at point 1 depending on the bus speeds.
>> 5. The Device read completes. This can happen after the Normal read
>> because of different bus speeds.
>> 6. TST clears CPSR.Z
>> 7. BEQ not executed.
>> 8. Normal read data moved to register.
>>
>> So, even if the CPU issues the read from Device and Normal memory in
>> order (steps 1, 3), they can happen at the device and RAM level out of
>> order (steps 4, 5) and the CPU could read data not yet written by the
>> device.
>>
>> The solution is to use a DSB which ensures the completion of the
>> Device read before issuing the Normal memory read.
>
> I agree a DSB is needed in such case, but I am not sure the patch is
> good. Maybe we should keep rmb not changed and only replace __iormb
> as DSB if this issue only happens between device io memory and normal
> memory, then rmb will not degrade performance a little as does by your
> patch.
What other use do you see for the mandatory barriers if not in
relation to I/O? They are not for SMP configurations. We already have
a DSB in wmb().
The mandatory barriers could be present in device drivers that use the
relaxed I/O accessors and mandatory barriers explicitly as
optimisation. This was the approach recommended in past mailing list
discussions.
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-10 10:10 ` Catalin Marinas
@ 2011-04-11 5:43 ` Ming Lei
2011-04-28 18:45 ` Russell King - ARM Linux
0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2011-04-11 5:43 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
2011/4/10 Catalin Marinas <catalin.marinas@arm.com>:
> On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote:
>> 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>:
>>> Probably the above document isn't comprehensive enough. It mainly
>>> targets memory ordering between processors. I think another example
>>> that mentions DSB is the mailbox scenario.
>>
>> I also saw this example of 8.1, but I don't think it is related with
>> linux read memory barrier, see below:
>>
>> ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before
>> ? ? ? ?'LDR R5, [R1]' in P2;
>
> Its not related to the read memory barrier but it refers to ordering
> between accesses to Normal memory and Device memory. The mailbox here
> at [R4] can be a device. I'm copying the relevant code here for others
> following this thread:
>
> P1:
> ? ?STR R5, [R1] ; message stored to shared memory location
> ? ?DSB [ST]
> ? ?STR R1, [R4] ; R4 contains the address of a mailbox
>
> P2:
> ? ?; interrupt service routine
> ? ?LDR R5, [R1]
>
>> ? ? ? ?- Documentation/memory-barriers.txt said:
>> ? ? ? ?1), Memory barriers are such interventions. ?They impose a
>> ? ? ? ?perceived partial ordering over the memory operations on either
>> ? ? ? ?side of the barrier.
>>
>> ? ? ? ?2),There is no guarantee that any of the memory accesses specified
>> ? ? ? ?before a memory barrier will be _complete_ by the completion of a
>> ? ? ? ?memory barrier instruction; the barrier can be considered to draw a
>> ? ? ? ?line in that CPU's access queue that accesses of the appropriate type
>> ? ? ? ?may not cross.
>
> While the above is correct, the Linux document is quite vague in
> relation to the mandatory barriers and it's not clear whether the
> above only refers to the SMP barriers.
Yes, I agree it is vague about mandatory barriers, so make Paul involved in.
>
>> In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead
>> of in P1 because of the belows:
>
> A DSB on one CPU does not control the ordering on another CPU. So the
> DSB on P2 is useless. Note that the P2 code is executed in an
Yes, your are right, sorry for the noise about this part.
> interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop
> polling for [R4] to get a valid address, we would indeed need a DSB
> before LDR R5, [R1].
>
>> ? ? ? ?- out of order of the two stores in P1 is allowable
>
> That's why we have a DSB in there, to ensure that the [R1] location is
> written before we store the R1 to the [R4] device.
Yes, I see now, I didn't know the interrupt is triggered by this before.
>
>> ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first,
>> ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if
>> ? ? ? ?R4 doesn't point to a Strongly-Ordered memory.
>
> Even if the memory is SO, on newer CPUs I don't think it guarantees
> the completion (pretty much acting like Device memory). But here we
> don't care when the store to [R4] completes. When this happen, P2 will
> get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt
> routine speculatively (i.e. before the interrupt was received).
>>> Let's assume we have a device that performs the two steps below:
>>>
>>> 1. Writes data to RAM
>>> 2. Updates its status register
>>>
>>> A driver running on the CPU has some code as below:
>>>
>>> ? ?LDR [Device] ? ?@ read the device status
>>> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl
>>> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready
>>> ? ?BEQ out
>>> ? ?LDR [Normal] ? ?@ read the DMA buffer
>>> ? ?...
>>> out:
>>>
>>> With the code above, the CPU may do the following steps:
>>>
>>> 1. Issue read from the device. Note that it does not wait for the read
>>> to complete.
>>> 2. DMB - ensures that no subsequent memory accesses happen before the
>>> previous ones.
>>> 3. Issues read from normal memory speculatively. This is allowed
>>> because the TST/BEQ are only flow control dependency. In case the
>>> condition fails, the read is discarded.
>>> 4. The read from Normal memory (DMA buffer) completes. This could
>>> happen before the I/O read at point 1 depending on the bus speeds.
>>> 5. The Device read completes. This can happen after the Normal read
>>> because of different bus speeds.
>>> 6. TST clears CPSR.Z
>>> 7. BEQ not executed.
>>> 8. Normal read data moved to register.
>>>
>>> So, even if the CPU issues the read from Device and Normal memory in
>>> order (steps 1, 3), they can happen at the device and RAM level out of
>>> order (steps 4, 5) and the CPU could read data not yet written by the
>>> device.
>>>
>>> The solution is to use a DSB which ensures the completion of the
>>> Device read before issuing the Normal memory read.
>>
>> I agree a DSB is needed in such case, but I am not sure the patch is
>> good. Maybe we should keep rmb not changed and only replace __iormb
>> as DSB if this issue only happens between device io memory and normal
>> memory, then rmb will not degrade performance a little as does by your
>> patch.
>
> What other use do you see for the mandatory barriers if not in
> relation to I/O? They are not for SMP configurations. We already have
> a DSB in wmb().
Yes, there are some usage of rmb/wmb not in relation to I/O, and you can find
the uses in fs, kernel directory. Also rmb/wmb has been used to order coherent
memory accesses, such as dma descriptor(see tg3, fireware/ohci, musb,
ehci/ohci, ...)
So your patch may have performance effect on these drivers and others...
>
> The mandatory barriers could be present in device drivers that use the
> relaxed I/O accessors and mandatory barriers explicitly as
> optimisation. This was the approach recommended in past mailing list
> discussions.
I think the focus of the discussion is on your change to rmb.
--
Ming Lei
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-11 5:43 ` Ming Lei
@ 2011-04-28 18:45 ` Russell King - ARM Linux
2011-04-28 21:37 ` Catalin Marinas
0 siblings, 1 reply; 12+ messages in thread
From: Russell King - ARM Linux @ 2011-04-28 18:45 UTC (permalink / raw)
To: linux-arm-kernel
So, what am I supposed to do with 6870/1 (rmb) and the wmb patch?
One or other is wrong - they certainly are not both correct as rmb()
can not be stronger than wmb().
On Mon, Apr 11, 2011 at 01:43:35PM +0800, Ming Lei wrote:
> Hi Catalin,
>
> 2011/4/10 Catalin Marinas <catalin.marinas@arm.com>:
> > On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote:
> >> 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>:
> >>> Probably the above document isn't comprehensive enough. It mainly
> >>> targets memory ordering between processors. I think another example
> >>> that mentions DSB is the mailbox scenario.
> >>
> >> I also saw this example of 8.1, but I don't think it is related with
> >> linux read memory barrier, see below:
> >>
> >> ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before
> >> ? ? ? ?'LDR R5, [R1]' in P2;
> >
> > Its not related to the read memory barrier but it refers to ordering
> > between accesses to Normal memory and Device memory. The mailbox here
> > at [R4] can be a device. I'm copying the relevant code here for others
> > following this thread:
> >
> > P1:
> > ? ?STR R5, [R1] ; message stored to shared memory location
> > ? ?DSB [ST]
> > ? ?STR R1, [R4] ; R4 contains the address of a mailbox
> >
> > P2:
> > ? ?; interrupt service routine
> > ? ?LDR R5, [R1]
> >
> >> ? ? ? ?- Documentation/memory-barriers.txt said:
> >> ? ? ? ?1), Memory barriers are such interventions. ?They impose a
> >> ? ? ? ?perceived partial ordering over the memory operations on either
> >> ? ? ? ?side of the barrier.
> >>
> >> ? ? ? ?2),There is no guarantee that any of the memory accesses specified
> >> ? ? ? ?before a memory barrier will be _complete_ by the completion of a
> >> ? ? ? ?memory barrier instruction; the barrier can be considered to draw a
> >> ? ? ? ?line in that CPU's access queue that accesses of the appropriate type
> >> ? ? ? ?may not cross.
> >
> > While the above is correct, the Linux document is quite vague in
> > relation to the mandatory barriers and it's not clear whether the
> > above only refers to the SMP barriers.
>
> Yes, I agree it is vague about mandatory barriers, so make Paul involved in.
>
> >
> >> In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead
> >> of in P1 because of the belows:
> >
> > A DSB on one CPU does not control the ordering on another CPU. So the
> > DSB on P2 is useless. Note that the P2 code is executed in an
>
> Yes, your are right, sorry for the noise about this part.
>
> > interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop
> > polling for [R4] to get a valid address, we would indeed need a DSB
> > before LDR R5, [R1].
> >
> >> ? ? ? ?- out of order of the two stores in P1 is allowable
> >
> > That's why we have a DSB in there, to ensure that the [R1] location is
> > written before we store the R1 to the [R4] device.
>
> Yes, I see now, I didn't know the interrupt is triggered by this before.
>
> >
> >> ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first,
> >> ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if
> >> ? ? ? ?R4 doesn't point to a Strongly-Ordered memory.
> >
> > Even if the memory is SO, on newer CPUs I don't think it guarantees
> > the completion (pretty much acting like Device memory). But here we
> > don't care when the store to [R4] completes. When this happen, P2 will
> > get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt
> > routine speculatively (i.e. before the interrupt was received).
> >>> Let's assume we have a device that performs the two steps below:
> >>>
> >>> 1. Writes data to RAM
> >>> 2. Updates its status register
> >>>
> >>> A driver running on the CPU has some code as below:
> >>>
> >>> ? ?LDR [Device] ? ?@ read the device status
> >>> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl
> >>> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready
> >>> ? ?BEQ out
> >>> ? ?LDR [Normal] ? ?@ read the DMA buffer
> >>> ? ?...
> >>> out:
> >>>
> >>> With the code above, the CPU may do the following steps:
> >>>
> >>> 1. Issue read from the device. Note that it does not wait for the read
> >>> to complete.
> >>> 2. DMB - ensures that no subsequent memory accesses happen before the
> >>> previous ones.
> >>> 3. Issues read from normal memory speculatively. This is allowed
> >>> because the TST/BEQ are only flow control dependency. In case the
> >>> condition fails, the read is discarded.
> >>> 4. The read from Normal memory (DMA buffer) completes. This could
> >>> happen before the I/O read at point 1 depending on the bus speeds.
> >>> 5. The Device read completes. This can happen after the Normal read
> >>> because of different bus speeds.
> >>> 6. TST clears CPSR.Z
> >>> 7. BEQ not executed.
> >>> 8. Normal read data moved to register.
> >>>
> >>> So, even if the CPU issues the read from Device and Normal memory in
> >>> order (steps 1, 3), they can happen at the device and RAM level out of
> >>> order (steps 4, 5) and the CPU could read data not yet written by the
> >>> device.
> >>>
> >>> The solution is to use a DSB which ensures the completion of the
> >>> Device read before issuing the Normal memory read.
> >>
> >> I agree a DSB is needed in such case, but I am not sure the patch is
> >> good. Maybe we should keep rmb not changed and only replace __iormb
> >> as DSB if this issue only happens between device io memory and normal
> >> memory, then rmb will not degrade performance a little as does by your
> >> patch.
> >
> > What other use do you see for the mandatory barriers if not in
> > relation to I/O? They are not for SMP configurations. We already have
> > a DSB in wmb().
>
> Yes, there are some usage of rmb/wmb not in relation to I/O, and you can find
> the uses in fs, kernel directory. Also rmb/wmb has been used to order coherent
> memory accesses, such as dma descriptor(see tg3, fireware/ohci, musb,
> ehci/ohci, ...)
>
> So your patch may have performance effect on these drivers and others...
>
> >
> > The mandatory barriers could be present in device drivers that use the
> > relaxed I/O accessors and mandatory barriers explicitly as
> > optimisation. This was the approach recommended in past mailing list
> > discussions.
>
> I think the focus of the discussion is on your change to rmb.
>
> --
> Ming Lei
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses
2011-04-28 18:45 ` Russell King - ARM Linux
@ 2011-04-28 21:37 ` Catalin Marinas
0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2011-04-28 21:37 UTC (permalink / raw)
To: linux-arm-kernel
On Thursday, 28 April 2011, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> So, what am I supposed to do with 6870/1 (rmb) and the wmb patch?
> One or other is wrong - they certainly are not both correct as rmb()
> can not be stronger than wmb().
Given Ming's reply to the wmb patch he posted, you should discard the
wmb patch. As for the rmb one (6870/1), I already explained why it is
needed.
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-04-28 21:37 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-25 10:16 [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses Catalin Marinas
2011-03-29 15:02 ` Martin Furmanski
2011-03-29 15:21 ` Catalin Marinas
[not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com>
[not found] ` <1301416642.583.101.camel@e102109-lin.cambridge.arm.com>
2011-03-29 21:06 ` Martin Furmanski
2011-03-30 8:53 ` Catalin Marinas
2011-04-07 7:07 ` Ming Lei
2011-04-09 8:57 ` Catalin Marinas
2011-04-09 17:03 ` Ming Lei
2011-04-10 10:10 ` Catalin Marinas
2011-04-11 5:43 ` Ming Lei
2011-04-28 18:45 ` Russell King - ARM Linux
2011-04-28 21:37 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).