* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness @ 2011-03-02 1:23 Saravana Kannan 2011-03-02 8:23 ` Arnd Bergmann 2011-03-02 8:39 ` Russell King - ARM Linux 0 siblings, 2 replies; 12+ messages in thread From: Saravana Kannan @ 2011-03-02 1:23 UTC (permalink / raw) To: linux-arm-kernel Russell, Catalin, In arch/arm/include/asm/io.h, we have the following piece of code: #ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE #define __iormb() rmb() #define __iowmb() wmb() #else #define __iormb() do { } while (0) #define __iowmb() do { } while (0) #endif #define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; }) #define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; }) #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); }) #define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) If I'm not missing some magic, this would mean that "CONFIG_ARM_DMA_MEM_BUFFERABLE" determines if readl(s)/writel(s) get to have a built in mb() or not. The rest of the emails is under the assumption that my statement above is true. This seems like a real bad thing to do. There are so many other drivers that don't use or care about DMA and might still want to ensure some ordering constraints between their readl(s)/writel(s). They can't depend on readl/writel taking care of it for them since their code could be used in a kernel configuration that doesn't enable this config. Firstly, I don't know how many people noticed this and realize they can't depend on readl/writel to take care of the mb()s for them. Seems like an unnecessary encouragement to make mistakes when it didn't need to be so. Secondly, even if they realize they have to take care of it, they will have to continue using mb()s in to force ordering between their reads/writes. So, are we depending on the compiler to optimize these extra mb() out in the case where the config is enabled? I'm not sure it will be able to optimize out the extra mb()s in all cases. Please let me know if I'm missing something and there is a good reason for writing the code as it is today. If there is not much of a reason for this, do you have any objections if I send a patch to split out the "readl/writel with built in mb()" as a separate config and 'select'ing it in ARM_DMA_MEM_BUFFERABLE? Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-02 1:23 CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness Saravana Kannan @ 2011-03-02 8:23 ` Arnd Bergmann 2011-03-03 7:57 ` Saravana Kannan 2011-03-02 8:39 ` Russell King - ARM Linux 1 sibling, 1 reply; 12+ messages in thread From: Arnd Bergmann @ 2011-03-02 8:23 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 02 March 2011 02:23:15 Saravana Kannan wrote: > There are so many other drivers that don't use or care about DMA and > might still want to ensure some ordering constraints between their > readl(s)/writel(s). They can't depend on readl/writel taking care of it > for them since their code could be used in a kernel configuration that > doesn't enable this config. What exactly are the ordering constraints, do you need the full dmb() for readl() and dsb() for writel(), or just a compiler barrier? I think we need the barrier() even for the relaxed variant, but that is fairly lightweight. What would be a reason to have more? > Firstly, I don't know how many people noticed this and realize they > can't depend on readl/writel to take care of the mb()s for them. Seems > like an unnecessary encouragement to make mistakes when it didn't need > to be so. > > Secondly, even if they realize they have to take care of it, they will > have to continue using mb()s in to force ordering between their > reads/writes. So, are we depending on the compiler to optimize these > extra mb() out in the case where the config is enabled? I'm not sure it > will be able to optimize out the extra mb()s in all cases. The compiler certainly won't merge multiple inline assembly statements, but multiple barrier() statements don't make it worse than just one. barrier() just forces accessing variables from memory that could otherwise be kept in registers. The other problems we still need to fix are the complete absence of barriers in inb()/outb() style accessors, which are meant to be stricter than readl()/writel(), and the fact that we rely on undefined behavior in gcc regarding the atomicity of multi-byte accesses, as we recently found out when a new gcc turned a readl into byte accesses. Arnd ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-02 8:23 ` Arnd Bergmann @ 2011-03-03 7:57 ` Saravana Kannan 0 siblings, 0 replies; 12+ messages in thread From: Saravana Kannan @ 2011-03-03 7:57 UTC (permalink / raw) To: linux-arm-kernel On 03/02/2011 12:23 AM, Arnd Bergmann wrote: > On Wednesday 02 March 2011 02:23:15 Saravana Kannan wrote: >> There are so many other drivers that don't use or care about DMA and >> might still want to ensure some ordering constraints between their >> readl(s)/writel(s). They can't depend on readl/writel taking care of it >> for them since their code could be used in a kernel configuration that >> doesn't enable this config. > > What exactly are the ordering constraints, do you need the full > dmb() for readl() and dsb() for writel(), or just a compiler barrier? I wasn't referring to a compiler barrier. I'm referring to one of the real barrier instructions. > I think we need the barrier() even for the relaxed variant, but > that is fairly lightweight. What would be a reason to have more? Please see my response to Russell in other other email. I think it should answer your question by clarifying my point. >> Firstly, I don't know how many people noticed this and realize they >> can't depend on readl/writel to take care of the mb()s for them. Seems >> like an unnecessary encouragement to make mistakes when it didn't need >> to be so. >> >> Secondly, even if they realize they have to take care of it, they will >> have to continue using mb()s in to force ordering between their >> reads/writes. So, are we depending on the compiler to optimize these >> extra mb() out in the case where the config is enabled? I'm not sure it >> will be able to optimize out the extra mb()s in all cases. > > The compiler certainly won't merge multiple inline assembly statements, > but multiple barrier() statements don't make it worse than just one. > barrier() just forces accessing variables from memory that could otherwise > be kept in registers. Yes, I was aware of how compiler barriers work and how multiple consecutive compiler barriers are the same as one compiler barrier. The email was about the real barrier instructions. > The other problems we still need to fix are the complete absence of > barriers in inb()/outb() style accessors, which are meant to be stricter > than readl()/writel(), and the fact that we rely on undefined behavior > in gcc regarding the atomicity of multi-byte accesses, as we recently > found out when a new gcc turned a readl into byte accesses. I haven't gotten that far :-) I really appreciate response! Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-02 1:23 CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness Saravana Kannan 2011-03-02 8:23 ` Arnd Bergmann @ 2011-03-02 8:39 ` Russell King - ARM Linux 2011-03-03 7:49 ` Saravana Kannan 1 sibling, 1 reply; 12+ messages in thread From: Russell King - ARM Linux @ 2011-03-02 8:39 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 01, 2011 at 05:23:15PM -0800, Saravana Kannan wrote: > If I'm not missing some magic, this would mean that > "CONFIG_ARM_DMA_MEM_BUFFERABLE" determines if readl(s)/writel(s) get to > have a built in mb() or not. You're missing that CONFIG_ARM_DMA_MEM_BUFFERABLE not only changes readl/writel but also the type for DMA coherent memory from strongly ordered to memory, non-cacheable. The barriers are required to ensure that reads and writes to DMA coherent memory are visible to the DMA device before the write completes, and any value read from DMA coherent memory will not bypass a read from a DMA device. The barriers in the IO macros have nothing to do with whether reads/writes to normal cacheable memory are visible to DMA devices. That is what the streaming DMA API is for. In any case, the IO macros are always ordered with respect to other device writes irrespective of CONFIG_ARM_DMA_MEM_BUFFERABLE. > There are so many other drivers that don't use or care about DMA and > might still want to ensure some ordering constraints between their > readl(s)/writel(s). They can't depend on readl/writel taking care of it > for them since their code could be used in a kernel configuration that > doesn't enable this config. The majority of device drivers don't need ordering on their IO macros. However, Linus refuses to introduce relaxed IO ordering to the kernel, saying that architectures must reflect the x86 behaviour as much as possible. So, we're stuck with device drivers which use readl/writel both where they don't need the ordering constraints _and_ where they do need the ordering constraints. That means we must provide the ordering in these macros to ensure proper system functioning. > Firstly, I don't know how many people noticed this and realize they > can't depend on readl/writel to take care of the mb()s for them. Seems > like an unnecessary encouragement to make mistakes when it didn't need > to be so. I think you misunderstand what's going on. IO accesses are always ordered with respect to themselves. The barriers are there to ensure ordering between DMA coherent memory (normal non-cached memory) and IO accesses (device). ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-02 8:39 ` Russell King - ARM Linux @ 2011-03-03 7:49 ` Saravana Kannan 2011-03-03 10:11 ` Catalin Marinas 2011-03-03 10:24 ` Russell King - ARM Linux 0 siblings, 2 replies; 12+ messages in thread From: Saravana Kannan @ 2011-03-03 7:49 UTC (permalink / raw) To: linux-arm-kernel Sorry that it took a while for me to get back. Had to contact our resident ARM expert to reconfirm my points and gather all the data in an externally available format. On 03/02/2011 12:39 AM, Russell King - ARM Linux wrote: > On Tue, Mar 01, 2011 at 05:23:15PM -0800, Saravana Kannan wrote: >> If I'm not missing some magic, this would mean that >> "CONFIG_ARM_DMA_MEM_BUFFERABLE" determines if readl(s)/writel(s) get to >> have a built in mb() or not. > > You're missing that CONFIG_ARM_DMA_MEM_BUFFERABLE not only changes > readl/writel but also the type for DMA coherent memory from strongly > ordered to memory, non-cacheable. Yeah, I noticed that after I sent the email, but my questions in my previous email still remains valid. See below. > The barriers are required to ensure that reads and writes to DMA > coherent memory are visible to the DMA device before the write > completes, and any value read from DMA coherent memory will not > bypass a read from a DMA device. > > The barriers in the IO macros have nothing to do with whether reads/writes > to normal cacheable memory are visible to DMA devices. That is what the > streaming DMA API is for. > > In any case, the IO macros are always ordered with respect to other > device writes irrespective of CONFIG_ARM_DMA_MEM_BUFFERABLE. <snip> > I think you misunderstand what's going on. IO accesses are always ordered > with respect to themselves. The barriers are there to ensure ordering > between DMA coherent memory (normal non-cached memory) and IO accesses > (device). Unfortunately this is not correct. The ARM spec doesn't guarantee that all IO accesses should be ordered with respect to themselves. It only requires that the ordering should be guaranteed at least within a 1KB region. You can find this info in ARMv7 ARM spec[1] named "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on page A3-45. There is a para that goes: "Accesses must arrive at any particular memory-mapped peripheral or block of memory in program order, that is, A1 must arrive before A2. There are no ordering restrictions about when accesses arrive at different peripherals or blocks of memory, provided that the accesses follow the general ordering rules given in this section." And the most critical point is hidden in a comment that goes: "The size of a memory mapped peripheral, or a block of memory, is IMPLEMENTATION DEFINED, but is not smaller than 1KByte." I guess most of the confusion is due to the ARM spec not being very obvious about the 1KB limitation. So, going back to my point, I think it's wrong for CONFIG_ARM_DMA_MEM_BUFFERABLE to control how stuff unrelated to DMA behaves. I have also encountered a few people who kept went "but readl/writel was recently changed to add mem barriers, so we can all remove the mb()s in our driver (unrelated to DMA) code". That would have made their code incorrect for two reasons: 1. readl/writel doesn't always have a mem barrier because of config that can be turned off. 2. In cases where readl/writel didn't have mb(), there is not enough ordering guarantee without an explicit mb(). I think as a community, we should stop saying that readl/writel ensures ordering with respect to all IO accesses. It doesn't even guarantee ordering within the same device (when their register regions are > 1KB). After reading the above, please let me know if a patch to decouple the "readl/writel with builtin mb()" from CONFIG_ARM_DMA_MEM_BUFFERABLE would be accepted. If so, I can go ahead and send it out soon. [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406b/index.html Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-03 7:49 ` Saravana Kannan @ 2011-03-03 10:11 ` Catalin Marinas 2011-03-09 4:37 ` Saravana Kannan 2011-03-03 10:24 ` Russell King - ARM Linux 1 sibling, 1 reply; 12+ messages in thread From: Catalin Marinas @ 2011-03-03 10:11 UTC (permalink / raw) To: linux-arm-kernel On Thu, 2011-03-03 at 07:49 +0000, Saravana Kannan wrote: > On 03/02/2011 12:39 AM, Russell King - ARM Linux wrote: > > On Tue, Mar 01, 2011 at 05:23:15PM -0800, Saravana Kannan wrote: > >> If I'm not missing some magic, this would mean that > >> "CONFIG_ARM_DMA_MEM_BUFFERABLE" determines if readl(s)/writel(s) get to > >> have a built in mb() or not. [...] > > I think you misunderstand what's going on. IO accesses are always ordered > > with respect to themselves. The barriers are there to ensure ordering > > between DMA coherent memory (normal non-cached memory) and IO accesses > > (device). > > Unfortunately this is not correct. The ARM spec doesn't guarantee that > all IO accesses should be ordered with respect to themselves. It only > requires that the ordering should be guaranteed at least within a 1KB > region. That's because the CPU does not have control of the delays on various buses. But a device connected to the same bus receives the accesses in order. > And the most critical point is hidden in a comment that goes: > "The size of a memory mapped peripheral, or a block of memory, is > IMPLEMENTATION DEFINED, but is not smaller than 1KByte." > > I guess most of the confusion is due to the ARM spec not being very > obvious about the 1KB limitation. What that means is that the hardware shouldn't have two different buses (possibly with different delays) within a 1KB range. Even if accesses to all peripherals are ordered, you still cannot guarantee that a writel() would change the state of a device (and that's specific to all architectures). Sometimes if you want to make sure the device state changed you need a readl() back. > So, going back to my point, I think it's wrong for > CONFIG_ARM_DMA_MEM_BUFFERABLE to control how stuff unrelated to DMA behaves. In an ideal world, all driver authors know what memory ordering is and add the necessary barriers. But since that's not the case, the only way to get ordering between Normal Non-cacheable access (DMA buffer) and Device access (via writel) is to add the mb() in the I/O accessors. This has been discussed at length on several occasions on linux-arch and LKML. We don't have other solution since adding barriers to drivers wasn't found feasible by others. Of course, if you can optimised your driver to use the relaxed accessors and add explicit barriers. > I have also encountered a few people who kept went "but readl/writel was > recently changed to add mem barriers, so we can all remove the mb()s in > our driver (unrelated to DMA) code". But why did they have those barriers around I/O accessors in the first place? As you say, nothing related to DMA. If you access two different devices and want to ensure an ordering of the state changes, I doubt a barrier would help. Most likely you need a read back from the device. > That would have made their code incorrect for two reasons: > 1. readl/writel doesn't always have a mem barrier because of config that > can be turned off. > 2. In cases where readl/writel didn't have mb(), there is not enough > ordering guarantee without an explicit mb(). See my comment above, what do they try to achieve by using mb() around already ordered I/O accessors? > I think as a community, we should stop saying that readl/writel ensures > ordering with respect to all IO accesses. It doesn't even guarantee > ordering within the same device (when their register regions are > 1KB). It definitely guarantees ordering to the same device. The 1KB is a minimum limit but there is no upper bound (and it should definitely cover a single device). > After reading the above, please let me know if a patch to decouple the > "readl/writel with builtin mb()" from CONFIG_ARM_DMA_MEM_BUFFERABLE > would be accepted. If so, I can go ahead and send it out soon. No, I don't think this would be acceptable. I can point you to past discussion where Linus stated that Normal Non-cacheable memory and I/O accesses should be ordered. What you can do actually is make sure that all architectures support the relaxed accessors and change the drivers to use these instead. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-03 10:11 ` Catalin Marinas @ 2011-03-09 4:37 ` Saravana Kannan 0 siblings, 0 replies; 12+ messages in thread From: Saravana Kannan @ 2011-03-09 4:37 UTC (permalink / raw) To: linux-arm-kernel Discussed this further with our resident ARM expert. The rest of the email is me understanding his comments and rephrasing them to suit this thread. On 03/03/2011 02:11 AM, Catalin Marinas wrote: > On Thu, 2011-03-03 at 07:49 +0000, Saravana Kannan wrote: >> On 03/02/2011 12:39 AM, Russell King - ARM Linux wrote: >>> On Tue, Mar 01, 2011 at 05:23:15PM -0800, Saravana Kannan wrote: >>>> If I'm not missing some magic, this would mean that >>>> "CONFIG_ARM_DMA_MEM_BUFFERABLE" determines if readl(s)/writel(s) get to >>>> have a built in mb() or not. > [...] >>> I think you misunderstand what's going on. IO accesses are always ordered >>> with respect to themselves. The barriers are there to ensure ordering >>> between DMA coherent memory (normal non-cached memory) and IO accesses >>> (device). >> >> Unfortunately this is not correct. The ARM spec doesn't guarantee that >> all IO accesses should be ordered with respect to themselves. It only >> requires that the ordering should be guaranteed at least within a 1KB >> region. > > That's because the CPU does not have control of the delays on various > buses. But a device connected to the same bus receives the accesses in > order. From the ARM spec, at the least, we have established that IO accesses are not guaranteed to be ordered with respect to all IO accesses. So, I kindly request that we no longer perpetuate the incorrect statement that IO accesses are ordered wrt to all IO accesses. The only continuing disagreement is on whether the minimum limit is 1KB or "the entire address space of a given device". More on that below. >> And the most critical point is hidden in a comment that goes: >> "The size of a memory mapped peripheral, or a block of memory, is >> IMPLEMENTATION DEFINED, but is not smaller than 1KByte." >> >> I guess most of the confusion is due to the ARM spec not being very >> obvious about the 1KB limitation. > > What that means is that the hardware shouldn't have two different buses > (possibly with different delays) within a 1KB range. Rephrasing comments from our resident expert: This 1KB limit is not really tied to there being different "speeds" on the buses. Rather it is a fundamental topological issue. It has to do with how the "core" interleaves the interconnect. If the interleave is at the 1KB level, there are two different bus interfaces at the 1KB interleave. So even if the downstream "device" has a single *input* bus, the processor and the interconnect have two different paths to get to that device, if the address space spans 1KB. So there is no guaranteed order across the 1KB boundary, even if in concept the two sides of the boundary are talking to the same "device". > Even if accesses to all peripherals are ordered, you still cannot > guarantee that a writel() would change the state of a device (and that's > specific to all architectures). Sometimes if you want to make sure the > device state changed you need a readl() back. Yes, a device might take much longer to change the state or process the command after the write completes. I realize that has nothing to do with memory barriers. >> So, going back to my point, I think it's wrong for >> CONFIG_ARM_DMA_MEM_BUFFERABLE to control how stuff unrelated to DMA behaves. > > In an ideal world, all driver authors know what memory ordering is and > add the necessary barriers. But since that's not the case, the only way > to get ordering between Normal Non-cacheable access (DMA buffer) and > Device access (via writel) is to add the mb() in the I/O accessors. > > This has been discussed at length on several occasions on linux-arch and > LKML. We don't have other solution since adding barriers to drivers > wasn't found feasible by others. Of course, if you can optimised your > driver to use the relaxed accessors and add explicit barriers. I think my comment for clean up was misunderstood to be more that what I intend. I will address this at the end of this email. >> I have also encountered a few people who kept went "but readl/writel was >> recently changed to add mem barriers, so we can all remove the mb()s in >> our driver (unrelated to DMA) code". > > But why did they have those barriers around I/O accessors in the first > place? As you say, nothing related to DMA. If you access two different > devices and want to ensure an ordering of the state changes, I doubt a > barrier would help. Most likely you need a read back from the device. These barriers were needed because the device spanned more than 1KB of address space. See below. >> That would have made their code incorrect for two reasons: >> 1. readl/writel doesn't always have a mem barrier because of config that >> can be turned off. >> 2. In cases where readl/writel didn't have mb(), there is not enough >> ordering guarantee without an explicit mb(). > > See my comment above, what do they try to achieve by using mb() around > already ordered I/O accessors? > >> I think as a community, we should stop saying that readl/writel ensures >> ordering with respect to all IO accesses. It doesn't even guarantee >> ordering within the same device (when their register regions are> 1KB). > > It definitely guarantees ordering to the same device. The 1KB is a > minimum limit but there is no upper bound (and it should definitely > cover a single device). Rephrasing comments from our resident expert: That is not true. It only guarantees ordering to the 1KB boundary. That is the implementation-defined limit of what constitutes a "device". Any "device" which crosses that boundary is viewed as two separate devices from the core's perspective, and there is no guaranteed order, absent an explicit barrier operation. The 1KB is the minimum size that the architecture allows the hardware to establish as its limit for guaranteed ordering. It is not true that the hardware is obligated to go *above* that limit just because a given "device" happens to have address space that crosses the boundary. As soon as you have two separate interconnect ports and thus two separate paths to get to the device, because it crossed the 1KB boundary, then you have the potential of being out-of-order. Software is obligated to use explicit barriers on accesses that cross that threshold, regardless of whether *physically* the accesses end up at the same downstream "device". >> After reading the above, please let me know if a patch to decouple the >> "readl/writel with builtin mb()" from CONFIG_ARM_DMA_MEM_BUFFERABLE >> would be accepted. If so, I can go ahead and send it out soon. > > No, I don't think this would be acceptable. I can point you to past > discussion where Linus stated that Normal Non-cacheable memory and I/O > accesses should be ordered. > > What you can do actually is make sure that all architectures support the > relaxed accessors and change the drivers to use these instead. There were a couple of reason for starting this thread: * Check if my understanding of CONFIG_ARM_DMA_MEM_BUFFERABLE's effect on readl/writl were correct and if I was missing anything. Result: Yes, I was missing some stuff about the config affecting DMA memory mapping, but I did understand the point about having mb()s in readl/writel correctly. * Bring to attention that the ARM spec doesn't guarantee all IO accesses are ordered wrt each other and have the community update it's understanding of IO access ordering. Result: It think we agree on this update. At least Russell seems to. The ARM spec is very clear on this. The only part we disagree on is the 1KB limit. * If my understanding about how CONFIG_ARM_DMA_MEM_BUFFERABLE affects readl/writel was correct and we agree on the IO access ordering update, then it would show that mb()s matter for more than just DMA. In which case, I wanted to split that single config into two and clean up the names while still leaving CONFIG_ARM_DMA_MEM_BUFFERABLE functionally the same. Result?: After the above discussion, I hope the community agrees that it's reasonable to reorganize and rename the config to make the config more understandable and less confusing to developers. Less confusion == less bugs in my opinion. I will send out a patch sometime this week to show what I mean. But modifying the behaviour of DMA was not my intention. I will of course read the discussion you mention (will ping you if I can't find it) to get a better understanding of the DMA stuff. Catalin, To summarize, Our internal ARM expert continues to state that the 1KB is the minimum limit even within a single device. Judging his expertise based on my previous interactions with him, I'm inclined to believe him. I think at the least we should give him the benefit of the doubt. Looks like a definitive way to settle this disagreement would be to check with the author of those updates/comment I referred to in the ARMv7 ARM. If the 1KB limit is correct, it would certainly help improving the correctness of drivers for the ARM kernel. If the 1KB limit is wrong, we don't lose anything. So, if you don't mind, could you please check with the author of those updates and comments in the ARMv7 ARM? Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-03 7:49 ` Saravana Kannan 2011-03-03 10:11 ` Catalin Marinas @ 2011-03-03 10:24 ` Russell King - ARM Linux 2011-03-09 4:58 ` Saravana Kannan 1 sibling, 1 reply; 12+ messages in thread From: Russell King - ARM Linux @ 2011-03-03 10:24 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 02, 2011 at 11:49:47PM -0800, Saravana Kannan wrote: >> I think you misunderstand what's going on. IO accesses are always ordered >> with respect to themselves. The barriers are there to ensure ordering >> between DMA coherent memory (normal non-cached memory) and IO accesses >> (device). > > Unfortunately this is not correct. The ARM spec doesn't guarantee that > all IO accesses should be ordered with respect to themselves. It only > requires that the ordering should be guaranteed at least within a 1KB > region. > > You can find this info in ARMv7 ARM spec[1] named > "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on > page A3-45. There is a para that goes: > > "Accesses must arrive at any particular memory-mapped peripheral or > block of memory in program order, that is, A1 must arrive before A2. > There are no ordering restrictions about when accesses arrive at > different peripherals or blocks of memory, provided that the accesses > follow the general ordering rules given in this section." That is news to me. My DDI0406B does not have this paragraph, so it's something that ARM has sprung upon us without telling *anyone* about it. It's not unreasonable or even unexpected. That is exactly the same condition which applies on buses like PCI due to write posting on bridges downstream of the CPU, and issuing memory barriers will not help with that. Consider two PCI devices each behind their own P2P bridge. Device A's bridge is really lazy and takes time to empty its write post buffer. Device B's bridge is really fast at getting writes. If you write to device A then device B, they'll arrive at device B before device A. Again, let me stress that memory barriers will not allow you to solve this problem. The only way to solve this is to read back from the device, because reads to device memory can not bypass writes to device memory, otherwise the system is unpredictable. With PCI, it's recommended to read back from the exact same address which you've written to _provided_ there's no side effects from doing so. If there are, you have to find some other solution to it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-03 10:24 ` Russell King - ARM Linux @ 2011-03-09 4:58 ` Saravana Kannan 2011-03-09 8:05 ` Russell King - ARM Linux 0 siblings, 1 reply; 12+ messages in thread From: Saravana Kannan @ 2011-03-09 4:58 UTC (permalink / raw) To: linux-arm-kernel On 03/03/2011 02:24 AM, Russell King - ARM Linux wrote: > On Wed, Mar 02, 2011 at 11:49:47PM -0800, Saravana Kannan wrote: >>> I think you misunderstand what's going on. IO accesses are always ordered >>> with respect to themselves. The barriers are there to ensure ordering >>> between DMA coherent memory (normal non-cached memory) and IO accesses >>> (device). >> >> Unfortunately this is not correct. The ARM spec doesn't guarantee that >> all IO accesses should be ordered with respect to themselves. It only >> requires that the ordering should be guaranteed at least within a 1KB >> region. >> >> You can find this info in ARMv7 ARM spec[1] named >> "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on >> page A3-45. There is a para that goes: >> >> "Accesses must arrive at any particular memory-mapped peripheral or >> block of memory in program order, that is, A1 must arrive before A2. >> There are no ordering restrictions about when accesses arrive at >> different peripherals or blocks of memory, provided that the accesses >> follow the general ordering rules given in this section." > > That is news to me. My DDI0406B does not have this paragraph, so it's > something that ARM has sprung upon us without telling *anyone* about it. > It's not unreasonable or even unexpected. That is exactly the same > condition which applies on buses like PCI due to write posting on bridges > downstream of the CPU, and issuing memory barriers will not help with > that. While the PCI stuff is true, as you say, it's not related to mb()s. The mb()s matter to the point of getting the writes to the intended "devices addresses" in the program order. What happens after that is a separate issue. So, going back to the discussion of mb()s and readl/writel (and variations), I think we should no longer state the all IO accesses are ordered wrt each other. Are we in agreement on this? Thanks, Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-09 4:58 ` Saravana Kannan @ 2011-03-09 8:05 ` Russell King - ARM Linux 2011-03-09 9:32 ` Saravana Kannan 0 siblings, 1 reply; 12+ messages in thread From: Russell King - ARM Linux @ 2011-03-09 8:05 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 08, 2011 at 08:58:20PM -0800, Saravana Kannan wrote: > On 03/03/2011 02:24 AM, Russell King - ARM Linux wrote: >> On Wed, Mar 02, 2011 at 11:49:47PM -0800, Saravana Kannan wrote: >>>> I think you misunderstand what's going on. IO accesses are always ordered >>>> with respect to themselves. The barriers are there to ensure ordering >>>> between DMA coherent memory (normal non-cached memory) and IO accesses >>>> (device). >>> >>> Unfortunately this is not correct. The ARM spec doesn't guarantee that >>> all IO accesses should be ordered with respect to themselves. It only >>> requires that the ordering should be guaranteed at least within a 1KB >>> region. >>> >>> You can find this info in ARMv7 ARM spec[1] named >>> "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on >>> page A3-45. There is a para that goes: >>> >>> "Accesses must arrive at any particular memory-mapped peripheral or >>> block of memory in program order, that is, A1 must arrive before A2. >>> There are no ordering restrictions about when accesses arrive at >>> different peripherals or blocks of memory, provided that the accesses >>> follow the general ordering rules given in this section." >> >> That is news to me. My DDI0406B does not have this paragraph, so it's >> something that ARM has sprung upon us without telling *anyone* about it. >> It's not unreasonable or even unexpected. That is exactly the same >> condition which applies on buses like PCI due to write posting on bridges >> downstream of the CPU, and issuing memory barriers will not help with >> that. > > While the PCI stuff is true, as you say, it's not related to mb()s. The > mb()s matter to the point of getting the writes to the intended > "devices addresses" in the program order. What happens after that is a > separate issue. > > So, going back to the discussion of mb()s and readl/writel (and > variations), I think we should no longer state the all IO accesses are > ordered wrt each other. Are we in agreement on this? No, because you clearly haven't understood the point I made. I still believe you are wrong on this point. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-09 8:05 ` Russell King - ARM Linux @ 2011-03-09 9:32 ` Saravana Kannan 2011-03-09 9:38 ` Russell King - ARM Linux 0 siblings, 1 reply; 12+ messages in thread From: Saravana Kannan @ 2011-03-09 9:32 UTC (permalink / raw) To: linux-arm-kernel On Wed, March 9, 2011 12:05 am, Russell King - ARM Linux wrote: > On Tue, Mar 08, 2011 at 08:58:20PM -0800, Saravana Kannan wrote: >> On 03/03/2011 02:24 AM, Russell King - ARM Linux wrote: >>> On Wed, Mar 02, 2011 at 11:49:47PM -0800, Saravana Kannan wrote: >>>>> I think you misunderstand what's going on. IO accesses are always >>>>> ordered >>>>> with respect to themselves. The barriers are there to ensure >>>>> ordering >>>>> between DMA coherent memory (normal non-cached memory) and IO >>>>> accesses >>>>> (device). >>>> >>>> Unfortunately this is not correct. The ARM spec doesn't guarantee that >>>> all IO accesses should be ordered with respect to themselves. It only >>>> requires that the ordering should be guaranteed at least within a 1KB >>>> region. >>>> >>>> You can find this info in ARMv7 ARM spec[1] named >>>> "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on >>>> page A3-45. There is a para that goes: >>>> >>>> "Accesses must arrive at any particular memory-mapped peripheral or >>>> block of memory in program order, that is, A1 must arrive before A2. >>>> There are no ordering restrictions about when accesses arrive at >>>> different peripherals or blocks of memory, provided that the accesses >>>> follow the general ordering rules given in this section." >>> >>> That is news to me. My DDI0406B does not have this paragraph, so it's >>> something that ARM has sprung upon us without telling *anyone* about >>> it. >>> It's not unreasonable or even unexpected. That is exactly the same >>> condition which applies on buses like PCI due to write posting on >>> bridges >>> downstream of the CPU, and issuing memory barriers will not help with >>> that. >> >> While the PCI stuff is true, as you say, it's not related to mb()s. The >> mb()s matter to the point of getting the writes to the intended >> "devices addresses" in the program order. What happens after that is a >> separate issue. >> >> So, going back to the discussion of mb()s and readl/writel (and >> variations), I think we should no longer state the all IO accesses are >> ordered wrt each other. Are we in agreement on this? > > No, because you clearly haven't understood the point I made. > > I still believe you are wrong on this point. I'm not going to pretend to be a PCI expert, but I think the ARMv7 ARM text I quoted makes it pretty clear that all IO accesses are not ordered wrt each other. So, why do you still disagree on that point? -Saravana -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 12+ messages in thread
* CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness 2011-03-09 9:32 ` Saravana Kannan @ 2011-03-09 9:38 ` Russell King - ARM Linux 0 siblings, 0 replies; 12+ messages in thread From: Russell King - ARM Linux @ 2011-03-09 9:38 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 09, 2011 at 01:32:15AM -0800, Saravana Kannan wrote: > On Wed, March 9, 2011 12:05 am, Russell King - ARM Linux wrote: > > On Tue, Mar 08, 2011 at 08:58:20PM -0800, Saravana Kannan wrote: > >> On 03/03/2011 02:24 AM, Russell King - ARM Linux wrote: > >>> On Wed, Mar 02, 2011 at 11:49:47PM -0800, Saravana Kannan wrote: > >>>>> I think you misunderstand what's going on. IO accesses are always > >>>>> ordered > >>>>> with respect to themselves. The barriers are there to ensure > >>>>> ordering > >>>>> between DMA coherent memory (normal non-cached memory) and IO > >>>>> accesses > >>>>> (device). > >>>> > >>>> Unfortunately this is not correct. The ARM spec doesn't guarantee that > >>>> all IO accesses should be ordered with respect to themselves. It only > >>>> requires that the ordering should be guaranteed at least within a 1KB > >>>> region. > >>>> > >>>> You can find this info in ARMv7 ARM spec[1] named > >>>> "DDI0406B_arm_architecture_reference_manual_errata_markup_8_0.pdf", on > >>>> page A3-45. There is a para that goes: > >>>> > >>>> "Accesses must arrive at any particular memory-mapped peripheral or > >>>> block of memory in program order, that is, A1 must arrive before A2. > >>>> There are no ordering restrictions about when accesses arrive at > >>>> different peripherals or blocks of memory, provided that the accesses > >>>> follow the general ordering rules given in this section." > >>> > >>> That is news to me. My DDI0406B does not have this paragraph, so it's > >>> something that ARM has sprung upon us without telling *anyone* about > >>> it. > >>> It's not unreasonable or even unexpected. That is exactly the same > >>> condition which applies on buses like PCI due to write posting on > >>> bridges > >>> downstream of the CPU, and issuing memory barriers will not help with > >>> that. > >> > >> While the PCI stuff is true, as you say, it's not related to mb()s. The > >> mb()s matter to the point of getting the writes to the intended > >> "devices addresses" in the program order. What happens after that is a > >> separate issue. > >> > >> So, going back to the discussion of mb()s and readl/writel (and > >> variations), I think we should no longer state the all IO accesses are > >> ordered wrt each other. Are we in agreement on this? > > > > No, because you clearly haven't understood the point I made. > > > > I still believe you are wrong on this point. > > I'm not going to pretend to be a PCI expert, but I think the ARMv7 ARM > text I quoted makes it pretty clear that all IO accesses are not ordered > wrt each other. So, why do you still disagree on that point? Because you're entirely missing my point, and you don't need to be a PCI expert to understand it. You're just not taking the time to think about it because it says "PCI" and I guess you're not interested in PCI. The point I made is that even on a strongly ordered CPU, accesses to PCI devices on different busses can be _out_ _of_ _order_. This is something we expect. This is something we deal with by reading back from the device after a write _if_ the driver deems that the relative ordering matters. That's a decision for the device driver to make - and the _only_ way to do that is by reading back from the device. Adding memory barriers do *not* help, and if your 'ARM expert' thinks it does then he's wrong. I will not change the CONFIG_ARM_DMA_MEM_BUFFERABLE text - this config option is about the type of memory used for the DMA memory, and not about IO access ordering. To make it so confuses the two issues. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-03-09 9:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-02 1:23 CONFIG_ARM_DMA_MEM_BUFFERABLE and readl/writel weirdness Saravana Kannan 2011-03-02 8:23 ` Arnd Bergmann 2011-03-03 7:57 ` Saravana Kannan 2011-03-02 8:39 ` Russell King - ARM Linux 2011-03-03 7:49 ` Saravana Kannan 2011-03-03 10:11 ` Catalin Marinas 2011-03-09 4:37 ` Saravana Kannan 2011-03-03 10:24 ` Russell King - ARM Linux 2011-03-09 4:58 ` Saravana Kannan 2011-03-09 8:05 ` Russell King - ARM Linux 2011-03-09 9:32 ` Saravana Kannan 2011-03-09 9:38 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).