* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses @ 2011-03-25 10:16 Catalin Marinas 2011-03-29 15:02 ` Martin Furmanski 0 siblings, 1 reply; 12+ messages in thread From: Catalin Marinas @ 2011-03-25 10:16 UTC (permalink / raw) To: linux-arm-kernel Since mandatory barriers may be used (explicitly or implicitly via readl etc.) to ensure the ordering between Device and Normal memory accesses, a DMB is not enough. This patch converts it to a DSB. Cc: Russell King <linux@arm.linux.org.uk> Cc: Colin Cross <ccross@android.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/system.h | 2 +- arch/arm/mach-realview/include/mach/barriers.h | 2 +- arch/arm/mach-tegra/include/mach/barriers.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h index 9a87823..926d867 100644 --- a/arch/arm/include/asm/system.h +++ b/arch/arm/include/asm/system.h @@ -159,7 +159,7 @@ extern unsigned int user_debug; #include <mach/barriers.h> #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) #define mb() do { dsb(); outer_sync(); } while (0) -#define rmb() dmb() +#define rmb() dsb() #define wmb() mb() #else #include <asm/memory.h> diff --git a/arch/arm/mach-realview/include/mach/barriers.h b/arch/arm/mach-realview/include/mach/barriers.h index 0c5d749..9a73219 100644 --- a/arch/arm/mach-realview/include/mach/barriers.h +++ b/arch/arm/mach-realview/include/mach/barriers.h @@ -4,5 +4,5 @@ * operation to deadlock the system. */ #define mb() dsb() -#define rmb() dmb() +#define rmb() dsb() #define wmb() mb() diff --git a/arch/arm/mach-tegra/include/mach/barriers.h b/arch/arm/mach-tegra/include/mach/barriers.h index cc11517..425b42e 100644 --- a/arch/arm/mach-tegra/include/mach/barriers.h +++ b/arch/arm/mach-tegra/include/mach/barriers.h @@ -23,7 +23,7 @@ #include <asm/outercache.h> -#define rmb() dmb() +#define rmb() dsb() #define wmb() do { dsb(); outer_sync(); } while (0) #define mb() wmb() ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-03-25 10:16 [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses Catalin Marinas @ 2011-03-29 15:02 ` Martin Furmanski 2011-03-29 15:21 ` Catalin Marinas 0 siblings, 1 reply; 12+ messages in thread From: Martin Furmanski @ 2011-03-29 15:02 UTC (permalink / raw) To: linux-arm-kernel Do you have a reference on this? I have been under the impression that DMB is a barrier for all memory accesses. I find no support in ARMv7, for the hypothesis that DSB is needed to order between Device and Normal. Regards, Martin Furmanski On Fri, Mar 25, 2011 at 11:16 AM, Catalin Marinas <catalin.marinas@arm.com>wrote: > Since mandatory barriers may be used (explicitly or implicitly via readl > etc.) to ensure the ordering between Device and Normal memory accesses, > a DMB is not enough. This patch converts it to a DSB. > > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Colin Cross <ccross@android.com> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > --- > arch/arm/include/asm/system.h | 2 +- > arch/arm/mach-realview/include/mach/barriers.h | 2 +- > arch/arm/mach-tegra/include/mach/barriers.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h > index 9a87823..926d867 100644 > --- a/arch/arm/include/asm/system.h > +++ b/arch/arm/include/asm/system.h > @@ -159,7 +159,7 @@ extern unsigned int user_debug; > #include <mach/barriers.h> > #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) > #define mb() do { dsb(); outer_sync(); } while (0) > -#define rmb() dmb() > +#define rmb() dsb() > #define wmb() mb() > #else > #include <asm/memory.h> > diff --git a/arch/arm/mach-realview/include/mach/barriers.h > b/arch/arm/mach-realview/include/mach/barriers.h > index 0c5d749..9a73219 100644 > --- a/arch/arm/mach-realview/include/mach/barriers.h > +++ b/arch/arm/mach-realview/include/mach/barriers.h > @@ -4,5 +4,5 @@ > * operation to deadlock the system. > */ > #define mb() dsb() > -#define rmb() dmb() > +#define rmb() dsb() > #define wmb() mb() > diff --git a/arch/arm/mach-tegra/include/mach/barriers.h > b/arch/arm/mach-tegra/include/mach/barriers.h > index cc11517..425b42e 100644 > --- a/arch/arm/mach-tegra/include/mach/barriers.h > +++ b/arch/arm/mach-tegra/include/mach/barriers.h > @@ -23,7 +23,7 @@ > > #include <asm/outercache.h> > > -#define rmb() dmb() > +#define rmb() dsb() > #define wmb() do { dsb(); outer_sync(); } while (0) > #define mb() wmb() > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110329/af7c1441/attachment-0001.html> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-03-29 15:02 ` Martin Furmanski @ 2011-03-29 15:21 ` Catalin Marinas [not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com> 2011-04-07 7:07 ` Ming Lei 0 siblings, 2 replies; 12+ messages in thread From: Catalin Marinas @ 2011-03-29 15:21 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote: > Do you have a reference on this? Usually the ARM ARM but a document with examples is this: http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf (infocenter.arm.com -> Developer Guides and Articles -> ARM architecture -> Barrier Litmus and Tests Cookbook) > I have been under the impression that DMB is a barrier for all memory > accesses. I find no support in ARMv7, for the hypothesis that DSB is > needed to order between Device and Normal. The key point is that DMB only ensures the *observability* of memory accesses by the processors and not arrival to the device or block of memory. In the drawing below, DMB only ensures the ordering at point (1). For arrival to RAM or Device you need a DSB. +------+ +------+ | P1 | | P2 | +------+ +------+ | (1) | +------+------+ | +-------------+-------------+ | | | +------+ +------+ +--------+ | RAM1 | | RAM2 | | Device | +------+ +------+ +--------+ -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com>]
[parent not found: <1301416642.583.101.camel@e102109-lin.cambridge.arm.com>]
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses [not found] ` <1301416642.583.101.camel@e102109-lin.cambridge.arm.com> @ 2011-03-29 21:06 ` Martin Furmanski 2011-03-30 8:53 ` Catalin Marinas 0 siblings, 1 reply; 12+ messages in thread From: Martin Furmanski @ 2011-03-29 21:06 UTC (permalink / raw) To: linux-arm-kernel These statements appear not even remotely valid. Certainly the CPU is only a bus master, so when you say slave/master memory access, what are you referring to? (Sorry for accidentally bringing the thread off the list for the last 2 posts. Bringing it back.) Best Regards, Martin Furmanski On Tue, Mar 29, 2011 at 6:37 PM, Catalin Marinas <catalin.marinas@arm.com>wrote: > On Tue, 2011-03-29 at 17:16 +0100, Martin Furmanski wrote: > > From memory_barriers.txt, regarding rmb(): > > > > " (3) Read (or load) memory barriers. > > > > A read barrier is a data dependency barrier plus a guarantee that > all the > > LOAD operations specified before the barrier will appear to happen > before > > all the LOAD operations specified after the barrier with respect to > the > > other components of the system." > > > > If above is the semantics that you are aiming for, then it is still > irrelevant whether > > the memory is Normal or Device. But other than that, I see no issues. > > The relevant part is that a Device access is usually a *slave* access > while Normal memory access is a *master* one. The DMB only ensures the > latter (as specified by the ARM ARM, though probably not very clear). > > I'm not sure whether the memory-barriers.txt doc covers all the usage in > the kernel but the mandatory barriers (mb/rmb/wmb) may be used to > control the effect of I/O accesses (as per the above document). On ARM, > this needs to be DSB rather than DMB. > > I can even give examples where a DMB would fail with regards to Device > memory. > > -- > Catalin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110329/06e8ffc3/attachment-0001.html> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-03-29 21:06 ` Martin Furmanski @ 2011-03-30 8:53 ` Catalin Marinas 0 siblings, 0 replies; 12+ messages in thread From: Catalin Marinas @ 2011-03-30 8:53 UTC (permalink / raw) To: linux-arm-kernel (could you please avoid top-posting, it makes it harder to follow?) On Tue, 2011-03-29 at 22:06 +0100, Martin Furmanski wrote: > These statements appear not even remotely valid. Certainly the CPU is > only a bus master, so when you say slave/master memory access, what > are you referring to? What I meant by master is memory access that the CPU initiates. By slave I was referring to accesses that the device or block of memory receive as a result of a bus master request. You can rephrase this if you want. On the DMB - it only ensures that the bus master requests are ordered, it doesn't care about the request arrival at a device or block of memory. > (Sorry for accidentally bringing the thread off the list for the last > 2 posts. Bringing it back.) It looks like your messages are held in a moderation queue (you may want to subscribe or wait for the moderator to approve them). -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-03-29 15:21 ` Catalin Marinas [not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com> @ 2011-04-07 7:07 ` Ming Lei 2011-04-09 8:57 ` Catalin Marinas 1 sibling, 1 reply; 12+ messages in thread From: Ming Lei @ 2011-04-07 7:07 UTC (permalink / raw) To: linux-arm-kernel Hi Catalin, 2011/3/29 Catalin Marinas <catalin.marinas@arm.com>: > On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote: >> Do you have a reference on this? > > Usually the ARM ARM but a document with examples is this: > > http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf After glancing over the above, I find DSB is only applied in the WFE/WFI example in section 6, but all other examples in this section do use DMB. So could you point out which example is the reference for the mandatory DSB wrt. read memory barrier? > (infocenter.arm.com -> Developer Guides and Articles -> ARM architecture > -> Barrier Litmus and Tests Cookbook) > >> I have been under the impression that DMB is a barrier for all memory >> accesses. I find no support in ARMv7, for the hypothesis that DSB is >> needed to order between Device and Normal. > > The key point is that DMB only ensures the *observability* of memory > accesses by the processors and not arrival to the device or block of How could you conclude that the memory accesses order is different with the order of memory requests observed on the same type of memory? > memory. In the drawing below, DMB only ensures the ordering at point > (1). For arrival to RAM or Device you need a DSB. > > ? ? ? +------+ ? ? +------+ > ? ? ? | ?P1 ?| ? ? | ?P2 ?| > ? ? ? +------+ ? ? +------+ > ? ? ? ? ?| ? ? (1) ? ? | > ? ? ? ? ?+------+------+ > ? ? ? ? ? ? ? ? | > ? +-------------+-------------+ > ? | ? ? ? ? ? ? | ? ? ? ? ? ? | > +------+ ? ? +------+ ? ? +--------+ > | RAM1 | ? ? | RAM2 | ? ? | Device | > +------+ ? ? +------+ ? ? +--------+ thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-07 7:07 ` Ming Lei @ 2011-04-09 8:57 ` Catalin Marinas 2011-04-09 17:03 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Catalin Marinas @ 2011-04-09 8:57 UTC (permalink / raw) To: linux-arm-kernel Hi, On 7 April 2011 10:07, Ming Lei <tom.leiming@gmail.com> wrote: > 2011/3/29 Catalin Marinas <catalin.marinas@arm.com>: >> On Tue, 2011-03-29 at 16:02 +0100, Martin Furmanski wrote: >>> Do you have a reference on this? >> >> Usually the ARM ARM but a document with examples is this: >> >> http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf > > After glancing over the above, I find DSB is only applied in the WFE/WFI > example in section 6, but all other examples in this section do use DMB. > > So could you point out which example is the reference for the mandatory DSB > wrt. read memory barrier? Probably the above document isn't comprehensive enough. It mainly targets memory ordering between processors. I think another example that mentions DSB is the mailbox scenario. Anyway, my patch is based on the discussions I had with the person that wrote the above document (and the ARM ARM). >>> I have been under the impression that DMB is a barrier for all memory >>> accesses. I find no support in ARMv7, for the hypothesis that DSB is >>> needed to order between Device and Normal. >> >> The key point is that DMB only ensures the *observability* of memory >> accesses by the processors and not arrival to the device or block of > > How could you conclude that the memory accesses order is different with > the order of memory requests observed on the same type of memory? I don't fully understand your question. But I'll give an example where the DMB fails. Let's assume we have a device that performs the two steps below: 1. Writes data to RAM 2. Updates its status register A driver running on the CPU has some code as below: LDR [Device] @ read the device status DMB @ current barrier that we have in readl TST @ check whether the DMA transfer is ready BEQ out LDR [Normal] @ read the DMA buffer ... out: With the code above, the CPU may do the following steps: 1. Issue read from the device. Note that it does not wait for the read to complete. 2. DMB - ensures that no subsequent memory accesses happen before the previous ones. 3. Issues read from normal memory speculatively. This is allowed because the TST/BEQ are only flow control dependency. In case the condition fails, the read is discarded. 4. The read from Normal memory (DMA buffer) completes. This could happen before the I/O read at point 1 depending on the bus speeds. 5. The Device read completes. This can happen after the Normal read because of different bus speeds. 6. TST clears CPSR.Z 7. BEQ not executed. 8. Normal read data moved to register. So, even if the CPU issues the read from Device and Normal memory in order (steps 1, 3), they can happen at the device and RAM level out of order (steps 4, 5) and the CPU could read data not yet written by the device. The solution is to use a DSB which ensures the completion of the Device read before issuing the Normal memory read. Note that if the device would update the ready state in some memory-mapped register via the same port as the CPU, a DMB would be enough (IOW, ordering ensured only for accesses initiated by bus masters). Hope this helps. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-09 8:57 ` Catalin Marinas @ 2011-04-09 17:03 ` Ming Lei 2011-04-10 10:10 ` Catalin Marinas 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2011-04-09 17:03 UTC (permalink / raw) To: linux-arm-kernel Hi Catalin, 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>: > Probably the above document isn't comprehensive enough. It mainly > targets memory ordering between processors. I think another example > that mentions DSB is the mailbox scenario. I also saw this example of 8.1, but I don't think it is related with linux read memory barrier, see below: - 8.1 requires that the 'STR R5, [R1]' in P1 is completed before 'LDR R5, [R1]' in P2; - Documentation/memory-barriers.txt said: 1), Memory barriers are such interventions. They impose a perceived partial ordering over the memory operations on either side of the barrier. 2),There is no guarantee that any of the memory accesses specified before a memory barrier will be _complete_ by the completion of a memory barrier instruction; the barrier can be considered to draw a line in that CPU's access queue that accesses of the appropriate type may not cross. In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead of in P1 because of the belows: - out of order of the two stores in P1 is allowable - P2 should get message address from the mailbox pointed by R4 first, so how can we make sure the 'STR R1, [R4]' in P1 is completed if R4 doesn't point to a Strongly-Ordered memory. > Anyway, my patch is based on the discussions I had with the person > that wrote the above document (and the ARM ARM). > >>>> I have been under the impression that DMB is a barrier for all memory >>>> accesses. I find no support in ARMv7, for the hypothesis that DSB is >>>> needed to order between Device and Normal. >>> >>> The key point is that DMB only ensures the *observability* of memory >>> accesses by the processors and not arrival to the device or block of >> >> How could you conclude that the memory accesses order is different with >> the order of memory requests observed on the same type of memory? > > I don't fully understand your question. But I'll give an example where > the DMB fails. > > Let's assume we have a device that performs the two steps below: > > 1. Writes data to RAM > 2. Updates its status register > > A driver running on the CPU has some code as below: > > ? ?LDR [Device] ? ?@ read the device status > ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl > ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready > ? ?BEQ out > ? ?LDR [Normal] ? ?@ read the DMA buffer > ? ?... > out: > > With the code above, the CPU may do the following steps: > > 1. Issue read from the device. Note that it does not wait for the read > to complete. > 2. DMB - ensures that no subsequent memory accesses happen before the > previous ones. > 3. Issues read from normal memory speculatively. This is allowed > because the TST/BEQ are only flow control dependency. In case the > condition fails, the read is discarded. > 4. The read from Normal memory (DMA buffer) completes. This could > happen before the I/O read at point 1 depending on the bus speeds. > 5. The Device read completes. This can happen after the Normal read > because of different bus speeds. > 6. TST clears CPSR.Z > 7. BEQ not executed. > 8. Normal read data moved to register. > > So, even if the CPU issues the read from Device and Normal memory in > order (steps 1, 3), they can happen at the device and RAM level out of > order (steps 4, 5) and the CPU could read data not yet written by the > device. > > The solution is to use a DSB which ensures the completion of the > Device read before issuing the Normal memory read. I agree a DSB is needed in such case, but I am not sure the patch is good. Maybe we should keep rmb not changed and only replace __iormb as DSB if this issue only happens between device io memory and normal memory, then rmb will not degrade performance a little as does by your patch. thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-09 17:03 ` Ming Lei @ 2011-04-10 10:10 ` Catalin Marinas 2011-04-11 5:43 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Catalin Marinas @ 2011-04-10 10:10 UTC (permalink / raw) To: linux-arm-kernel On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote: > 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>: >> Probably the above document isn't comprehensive enough. It mainly >> targets memory ordering between processors. I think another example >> that mentions DSB is the mailbox scenario. > > I also saw this example of 8.1, but I don't think it is related with > linux read memory barrier, see below: > > ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before > ? ? ? ?'LDR R5, [R1]' in P2; Its not related to the read memory barrier but it refers to ordering between accesses to Normal memory and Device memory. The mailbox here at [R4] can be a device. I'm copying the relevant code here for others following this thread: P1: STR R5, [R1] ; message stored to shared memory location DSB [ST] STR R1, [R4] ; R4 contains the address of a mailbox P2: ; interrupt service routine LDR R5, [R1] > ? ? ? ?- Documentation/memory-barriers.txt said: > ? ? ? ?1), Memory barriers are such interventions. ?They impose a > ? ? ? ?perceived partial ordering over the memory operations on either > ? ? ? ?side of the barrier. > > ? ? ? ?2),There is no guarantee that any of the memory accesses specified > ? ? ? ?before a memory barrier will be _complete_ by the completion of a > ? ? ? ?memory barrier instruction; the barrier can be considered to draw a > ? ? ? ?line in that CPU's access queue that accesses of the appropriate type > ? ? ? ?may not cross. While the above is correct, the Linux document is quite vague in relation to the mandatory barriers and it's not clear whether the above only refers to the SMP barriers. > In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead > of in P1 because of the belows: A DSB on one CPU does not control the ordering on another CPU. So the DSB on P2 is useless. Note that the P2 code is executed in an interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop polling for [R4] to get a valid address, we would indeed need a DSB before LDR R5, [R1]. > ? ? ? ?- out of order of the two stores in P1 is allowable That's why we have a DSB in there, to ensure that the [R1] location is written before we store the R1 to the [R4] device. > ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first, > ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if > ? ? ? ?R4 doesn't point to a Strongly-Ordered memory. Even if the memory is SO, on newer CPUs I don't think it guarantees the completion (pretty much acting like Device memory). But here we don't care when the store to [R4] completes. When this happen, P2 will get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt routine speculatively (i.e. before the interrupt was received). >> Let's assume we have a device that performs the two steps below: >> >> 1. Writes data to RAM >> 2. Updates its status register >> >> A driver running on the CPU has some code as below: >> >> ? ?LDR [Device] ? ?@ read the device status >> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl >> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready >> ? ?BEQ out >> ? ?LDR [Normal] ? ?@ read the DMA buffer >> ? ?... >> out: >> >> With the code above, the CPU may do the following steps: >> >> 1. Issue read from the device. Note that it does not wait for the read >> to complete. >> 2. DMB - ensures that no subsequent memory accesses happen before the >> previous ones. >> 3. Issues read from normal memory speculatively. This is allowed >> because the TST/BEQ are only flow control dependency. In case the >> condition fails, the read is discarded. >> 4. The read from Normal memory (DMA buffer) completes. This could >> happen before the I/O read at point 1 depending on the bus speeds. >> 5. The Device read completes. This can happen after the Normal read >> because of different bus speeds. >> 6. TST clears CPSR.Z >> 7. BEQ not executed. >> 8. Normal read data moved to register. >> >> So, even if the CPU issues the read from Device and Normal memory in >> order (steps 1, 3), they can happen at the device and RAM level out of >> order (steps 4, 5) and the CPU could read data not yet written by the >> device. >> >> The solution is to use a DSB which ensures the completion of the >> Device read before issuing the Normal memory read. > > I agree a DSB is needed in such case, but I am not sure the patch is > good. Maybe we should keep rmb not changed and only replace __iormb > as DSB if this issue only happens between device io memory and normal > memory, then rmb will not degrade performance a little as does by your > patch. What other use do you see for the mandatory barriers if not in relation to I/O? They are not for SMP configurations. We already have a DSB in wmb(). The mandatory barriers could be present in device drivers that use the relaxed I/O accessors and mandatory barriers explicitly as optimisation. This was the approach recommended in past mailing list discussions. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-10 10:10 ` Catalin Marinas @ 2011-04-11 5:43 ` Ming Lei 2011-04-28 18:45 ` Russell King - ARM Linux 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2011-04-11 5:43 UTC (permalink / raw) To: linux-arm-kernel Hi Catalin, 2011/4/10 Catalin Marinas <catalin.marinas@arm.com>: > On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote: >> 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>: >>> Probably the above document isn't comprehensive enough. It mainly >>> targets memory ordering between processors. I think another example >>> that mentions DSB is the mailbox scenario. >> >> I also saw this example of 8.1, but I don't think it is related with >> linux read memory barrier, see below: >> >> ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before >> ? ? ? ?'LDR R5, [R1]' in P2; > > Its not related to the read memory barrier but it refers to ordering > between accesses to Normal memory and Device memory. The mailbox here > at [R4] can be a device. I'm copying the relevant code here for others > following this thread: > > P1: > ? ?STR R5, [R1] ; message stored to shared memory location > ? ?DSB [ST] > ? ?STR R1, [R4] ; R4 contains the address of a mailbox > > P2: > ? ?; interrupt service routine > ? ?LDR R5, [R1] > >> ? ? ? ?- Documentation/memory-barriers.txt said: >> ? ? ? ?1), Memory barriers are such interventions. ?They impose a >> ? ? ? ?perceived partial ordering over the memory operations on either >> ? ? ? ?side of the barrier. >> >> ? ? ? ?2),There is no guarantee that any of the memory accesses specified >> ? ? ? ?before a memory barrier will be _complete_ by the completion of a >> ? ? ? ?memory barrier instruction; the barrier can be considered to draw a >> ? ? ? ?line in that CPU's access queue that accesses of the appropriate type >> ? ? ? ?may not cross. > > While the above is correct, the Linux document is quite vague in > relation to the mandatory barriers and it's not clear whether the > above only refers to the SMP barriers. Yes, I agree it is vague about mandatory barriers, so make Paul involved in. > >> In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead >> of in P1 because of the belows: > > A DSB on one CPU does not control the ordering on another CPU. So the > DSB on P2 is useless. Note that the P2 code is executed in an Yes, your are right, sorry for the noise about this part. > interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop > polling for [R4] to get a valid address, we would indeed need a DSB > before LDR R5, [R1]. > >> ? ? ? ?- out of order of the two stores in P1 is allowable > > That's why we have a DSB in there, to ensure that the [R1] location is > written before we store the R1 to the [R4] device. Yes, I see now, I didn't know the interrupt is triggered by this before. > >> ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first, >> ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if >> ? ? ? ?R4 doesn't point to a Strongly-Ordered memory. > > Even if the memory is SO, on newer CPUs I don't think it guarantees > the completion (pretty much acting like Device memory). But here we > don't care when the store to [R4] completes. When this happen, P2 will > get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt > routine speculatively (i.e. before the interrupt was received). >>> Let's assume we have a device that performs the two steps below: >>> >>> 1. Writes data to RAM >>> 2. Updates its status register >>> >>> A driver running on the CPU has some code as below: >>> >>> ? ?LDR [Device] ? ?@ read the device status >>> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl >>> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready >>> ? ?BEQ out >>> ? ?LDR [Normal] ? ?@ read the DMA buffer >>> ? ?... >>> out: >>> >>> With the code above, the CPU may do the following steps: >>> >>> 1. Issue read from the device. Note that it does not wait for the read >>> to complete. >>> 2. DMB - ensures that no subsequent memory accesses happen before the >>> previous ones. >>> 3. Issues read from normal memory speculatively. This is allowed >>> because the TST/BEQ are only flow control dependency. In case the >>> condition fails, the read is discarded. >>> 4. The read from Normal memory (DMA buffer) completes. This could >>> happen before the I/O read at point 1 depending on the bus speeds. >>> 5. The Device read completes. This can happen after the Normal read >>> because of different bus speeds. >>> 6. TST clears CPSR.Z >>> 7. BEQ not executed. >>> 8. Normal read data moved to register. >>> >>> So, even if the CPU issues the read from Device and Normal memory in >>> order (steps 1, 3), they can happen at the device and RAM level out of >>> order (steps 4, 5) and the CPU could read data not yet written by the >>> device. >>> >>> The solution is to use a DSB which ensures the completion of the >>> Device read before issuing the Normal memory read. >> >> I agree a DSB is needed in such case, but I am not sure the patch is >> good. Maybe we should keep rmb not changed and only replace __iormb >> as DSB if this issue only happens between device io memory and normal >> memory, then rmb will not degrade performance a little as does by your >> patch. > > What other use do you see for the mandatory barriers if not in > relation to I/O? They are not for SMP configurations. We already have > a DSB in wmb(). Yes, there are some usage of rmb/wmb not in relation to I/O, and you can find the uses in fs, kernel directory. Also rmb/wmb has been used to order coherent memory accesses, such as dma descriptor(see tg3, fireware/ohci, musb, ehci/ohci, ...) So your patch may have performance effect on these drivers and others... > > The mandatory barriers could be present in device drivers that use the > relaxed I/O accessors and mandatory barriers explicitly as > optimisation. This was the approach recommended in past mailing list > discussions. I think the focus of the discussion is on your change to rmb. -- Ming Lei ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-11 5:43 ` Ming Lei @ 2011-04-28 18:45 ` Russell King - ARM Linux 2011-04-28 21:37 ` Catalin Marinas 0 siblings, 1 reply; 12+ messages in thread From: Russell King - ARM Linux @ 2011-04-28 18:45 UTC (permalink / raw) To: linux-arm-kernel So, what am I supposed to do with 6870/1 (rmb) and the wmb patch? One or other is wrong - they certainly are not both correct as rmb() can not be stronger than wmb(). On Mon, Apr 11, 2011 at 01:43:35PM +0800, Ming Lei wrote: > Hi Catalin, > > 2011/4/10 Catalin Marinas <catalin.marinas@arm.com>: > > On 9 April 2011 20:03, Ming Lei <tom.leiming@gmail.com> wrote: > >> 2011/4/9 Catalin Marinas <catalin.marinas@arm.com>: > >>> Probably the above document isn't comprehensive enough. It mainly > >>> targets memory ordering between processors. I think another example > >>> that mentions DSB is the mailbox scenario. > >> > >> I also saw this example of 8.1, but I don't think it is related with > >> linux read memory barrier, see below: > >> > >> ? ? ? ?- 8.1 requires that the 'STR R5, [R1]' in P1 is completed before > >> ? ? ? ?'LDR R5, [R1]' in P2; > > > > Its not related to the read memory barrier but it refers to ordering > > between accesses to Normal memory and Device memory. The mailbox here > > at [R4] can be a device. I'm copying the relevant code here for others > > following this thread: > > > > P1: > > ? ?STR R5, [R1] ; message stored to shared memory location > > ? ?DSB [ST] > > ? ?STR R1, [R4] ; R4 contains the address of a mailbox > > > > P2: > > ? ?; interrupt service routine > > ? ?LDR R5, [R1] > > > >> ? ? ? ?- Documentation/memory-barriers.txt said: > >> ? ? ? ?1), Memory barriers are such interventions. ?They impose a > >> ? ? ? ?perceived partial ordering over the memory operations on either > >> ? ? ? ?side of the barrier. > >> > >> ? ? ? ?2),There is no guarantee that any of the memory accesses specified > >> ? ? ? ?before a memory barrier will be _complete_ by the completion of a > >> ? ? ? ?memory barrier instruction; the barrier can be considered to draw a > >> ? ? ? ?line in that CPU's access queue that accesses of the appropriate type > >> ? ? ? ?may not cross. > > > > While the above is correct, the Linux document is quite vague in > > relation to the mandatory barriers and it's not clear whether the > > above only refers to the SMP barriers. > > Yes, I agree it is vague about mandatory barriers, so make Paul involved in. > > > > >> In fact, IMO, the 'DSB' should be used before 'LDR R5, [R1]' in P2 instead > >> of in P1 because of the belows: > > > > A DSB on one CPU does not control the ordering on another CPU. So the > > DSB on P2 is useless. Note that the P2 code is executed in an > > Yes, your are right, sorry for the noise about this part. > > > interrupt routine raised as a result of STR R1, [R4]. If P2 had a loop > > polling for [R4] to get a valid address, we would indeed need a DSB > > before LDR R5, [R1]. > > > >> ? ? ? ?- out of order of the two stores in P1 is allowable > > > > That's why we have a DSB in there, to ensure that the [R1] location is > > written before we store the R1 to the [R4] device. > > Yes, I see now, I didn't know the interrupt is triggered by this before. > > > > >> ? ? ? ?- P2 should get message address from the mailbox pointed by R4 first, > >> ? ? ? ?so how can we make sure the 'STR R1, [R4]' in P1 is completed if > >> ? ? ? ?R4 doesn't point to a Strongly-Ordered memory. > > > > Even if the memory is SO, on newer CPUs I don't think it guarantees > > the completion (pretty much acting like Device memory). But here we > > don't care when the store to [R4] completes. When this happen, P2 will > > get an interrupt. P2 does not execute the LDR R5, [R1] in interrupt > > routine speculatively (i.e. before the interrupt was received). > >>> Let's assume we have a device that performs the two steps below: > >>> > >>> 1. Writes data to RAM > >>> 2. Updates its status register > >>> > >>> A driver running on the CPU has some code as below: > >>> > >>> ? ?LDR [Device] ? ?@ read the device status > >>> ? ?DMB ? ? ? ? ? ? ? @ current barrier that we have in readl > >>> ? ?TST ? ? ? ? ? ? ? ? @ check whether the DMA transfer is ready > >>> ? ?BEQ out > >>> ? ?LDR [Normal] ? ?@ read the DMA buffer > >>> ? ?... > >>> out: > >>> > >>> With the code above, the CPU may do the following steps: > >>> > >>> 1. Issue read from the device. Note that it does not wait for the read > >>> to complete. > >>> 2. DMB - ensures that no subsequent memory accesses happen before the > >>> previous ones. > >>> 3. Issues read from normal memory speculatively. This is allowed > >>> because the TST/BEQ are only flow control dependency. In case the > >>> condition fails, the read is discarded. > >>> 4. The read from Normal memory (DMA buffer) completes. This could > >>> happen before the I/O read at point 1 depending on the bus speeds. > >>> 5. The Device read completes. This can happen after the Normal read > >>> because of different bus speeds. > >>> 6. TST clears CPSR.Z > >>> 7. BEQ not executed. > >>> 8. Normal read data moved to register. > >>> > >>> So, even if the CPU issues the read from Device and Normal memory in > >>> order (steps 1, 3), they can happen at the device and RAM level out of > >>> order (steps 4, 5) and the CPU could read data not yet written by the > >>> device. > >>> > >>> The solution is to use a DSB which ensures the completion of the > >>> Device read before issuing the Normal memory read. > >> > >> I agree a DSB is needed in such case, but I am not sure the patch is > >> good. Maybe we should keep rmb not changed and only replace __iormb > >> as DSB if this issue only happens between device io memory and normal > >> memory, then rmb will not degrade performance a little as does by your > >> patch. > > > > What other use do you see for the mandatory barriers if not in > > relation to I/O? They are not for SMP configurations. We already have > > a DSB in wmb(). > > Yes, there are some usage of rmb/wmb not in relation to I/O, and you can find > the uses in fs, kernel directory. Also rmb/wmb has been used to order coherent > memory accesses, such as dma descriptor(see tg3, fireware/ohci, musb, > ehci/ohci, ...) > > So your patch may have performance effect on these drivers and others... > > > > > The mandatory barriers could be present in device drivers that use the > > relaxed I/O accessors and mandatory barriers explicitly as > > optimisation. This was the approach recommended in past mailing list > > discussions. > > I think the focus of the discussion is on your change to rmb. > > -- > Ming Lei > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses 2011-04-28 18:45 ` Russell King - ARM Linux @ 2011-04-28 21:37 ` Catalin Marinas 0 siblings, 0 replies; 12+ messages in thread From: Catalin Marinas @ 2011-04-28 21:37 UTC (permalink / raw) To: linux-arm-kernel On Thursday, 28 April 2011, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > So, what am I supposed to do with 6870/1 (rmb) and the wmb patch? > One or other is wrong - they certainly are not both correct as rmb() > can not be stronger than wmb(). Given Ming's reply to the wmb patch he posted, you should discard the wmb patch. As for the rmb one (6870/1), I already explained why it is needed. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-04-28 21:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-25 10:16 [PATCH] ARM: The mandatory barrier rmb() must be a dsb() in for device accesses Catalin Marinas 2011-03-29 15:02 ` Martin Furmanski 2011-03-29 15:21 ` Catalin Marinas [not found] ` <AANLkTim1=686NAO7ox8s-HAd_KT4OjiUb1Y7jrUtgia3@mail.gmail.com> [not found] ` <1301416642.583.101.camel@e102109-lin.cambridge.arm.com> 2011-03-29 21:06 ` Martin Furmanski 2011-03-30 8:53 ` Catalin Marinas 2011-04-07 7:07 ` Ming Lei 2011-04-09 8:57 ` Catalin Marinas 2011-04-09 17:03 ` Ming Lei 2011-04-10 10:10 ` Catalin Marinas 2011-04-11 5:43 ` Ming Lei 2011-04-28 18:45 ` Russell King - ARM Linux 2011-04-28 21:37 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).