public inbox for linux-omap@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ARM MMU: add strongly-ordered memory type
@ 2008-08-04 23:40 Paul Walmsley
  2008-08-05 11:49 ` Ben Dooks
  2008-08-06  9:53 ` Catalin Marinas
  0 siblings, 2 replies; 24+ messages in thread
From: Paul Walmsley @ 2008-08-04 23:40 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-omap, r-woodruff2


Add the MT_MEMORY_STRONGLY_ORDERED memory type for ARM strongly ordered
memory.

This is used on OMAP3 for on-board SRAM.  On OMAP, SRAM is used for code 
that changes the SDRAM controller's clock, temporarily blocking access to 
SDRAM.  During this period, as code executes from SRAM, the ARM cache 
controller can attempt to write dirty cache lines back to SDRAM to make 
room for SRAM cache lines, causing the MPU subsystem to hang.  To avoid 
this, we mark SRAM as strongly- ordered memory.

Problem noted by Richard Woodruff <r-woodruff2@ti.com>.  Fix derived
from the TI CDP codebase.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
---

 arch/arm/mm/mmu.c          |    5 +++++
 include/asm-arm/mach/map.h |   13 +++++++------
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 2d6d682..5b56539 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -239,6 +239,11 @@ static struct mem_type mem_types[] = {
 		.prot_sect = PMD_TYPE_SECT,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_STRONGLY_ORDERED] = {
+		.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
+				PMD_SECT_UNCACHED,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
diff --git a/include/asm-arm/mach/map.h b/include/asm-arm/mach/map.h
index 7ef3c83..8cb46b7 100644
--- a/include/asm-arm/mach/map.h
+++ b/include/asm-arm/mach/map.h
@@ -19,12 +19,13 @@ struct map_desc {
 };
 
 /* types 0-3 are defined in asm/io.h */
-#define MT_CACHECLEAN		4
-#define MT_MINICLEAN		5
-#define MT_LOW_VECTORS		6
-#define MT_HIGH_VECTORS		7
-#define MT_MEMORY		8
-#define MT_ROM			9
+#define MT_CACHECLEAN			4
+#define MT_MINICLEAN			5
+#define MT_LOW_VECTORS			6
+#define MT_HIGH_VECTORS			7
+#define MT_MEMORY			8
+#define MT_ROM				9
+#define MT_MEMORY_STRONGLY_ORDERED	10
 
 #define MT_NONSHARED_DEVICE	MT_DEVICE_NONSHARED
 #define MT_IXP2000_DEVICE	MT_DEVICE_IXP2000


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-04 23:40 [PATCH] ARM MMU: add strongly-ordered memory type Paul Walmsley
@ 2008-08-05 11:49 ` Ben Dooks
  2008-08-05 12:15   ` Woodruff, Richard
  2008-08-06  9:53 ` Catalin Marinas
  1 sibling, 1 reply; 24+ messages in thread
From: Ben Dooks @ 2008-08-05 11:49 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-arm-kernel, linux-omap, r-woodruff2

On Mon, Aug 04, 2008 at 05:40:57PM -0600, Paul Walmsley wrote:
> 
> Add the MT_MEMORY_STRONGLY_ORDERED memory type for ARM strongly ordered
> memory.
> 
> This is used on OMAP3 for on-board SRAM.  On OMAP, SRAM is used for code 
> that changes the SDRAM controller's clock, temporarily blocking access to 
> SDRAM.  During this period, as code executes from SRAM, the ARM cache 
> controller can attempt to write dirty cache lines back to SDRAM to make 
> room for SRAM cache lines, causing the MPU subsystem to hang.  To avoid 
> this, we mark SRAM as strongly- ordered memory.

Is the controller allowed to write dirty cache lines out at any time it
likes? Surely a better fix is to drain the cache of the changes before
changing the clock for the SDRAM?
 
> Problem noted by Richard Woodruff <r-woodruff2@ti.com>.  Fix derived
> from the TI CDP codebase.
> 
> Signed-off-by: Paul Walmsley <paul@pwsan.com>
> ---
> 
>  arch/arm/mm/mmu.c          |    5 +++++
>  include/asm-arm/mach/map.h |   13 +++++++------
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 2d6d682..5b56539 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -239,6 +239,11 @@ static struct mem_type mem_types[] = {
>  		.prot_sect = PMD_TYPE_SECT,
>  		.domain    = DOMAIN_KERNEL,
>  	},
> +	[MT_MEMORY_STRONGLY_ORDERED] = {
> +		.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
> +				PMD_SECT_UNCACHED,
> +		.domain    = DOMAIN_KERNEL,
> +	},
>  };
>  
>  const struct mem_type *get_mem_type(unsigned int type)
> diff --git a/include/asm-arm/mach/map.h b/include/asm-arm/mach/map.h
> index 7ef3c83..8cb46b7 100644
> --- a/include/asm-arm/mach/map.h
> +++ b/include/asm-arm/mach/map.h
> @@ -19,12 +19,13 @@ struct map_desc {
>  };
>  
>  /* types 0-3 are defined in asm/io.h */
> -#define MT_CACHECLEAN		4
> -#define MT_MINICLEAN		5
> -#define MT_LOW_VECTORS		6
> -#define MT_HIGH_VECTORS		7
> -#define MT_MEMORY		8
> -#define MT_ROM			9
> +#define MT_CACHECLEAN			4
> +#define MT_MINICLEAN			5
> +#define MT_LOW_VECTORS			6
> +#define MT_HIGH_VECTORS			7
> +#define MT_MEMORY			8
> +#define MT_ROM				9
> +#define MT_MEMORY_STRONGLY_ORDERED	10
>  
>  #define MT_NONSHARED_DEVICE	MT_DEVICE_NONSHARED
>  #define MT_IXP2000_DEVICE	MT_DEVICE_IXP2000
> 
> 
> -------------------------------------------------------------------
> List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
> FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
> Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

-- 
-- 
Ben

Q:      What's a light-year?
A:      One-third less calories than a regular year.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-05 11:49 ` Ben Dooks
@ 2008-08-05 12:15   ` Woodruff, Richard
  2008-08-06 10:20     ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-05 12:15 UTC (permalink / raw)
  To: Ben Dooks, Paul Walmsley
  Cc: linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

> Is the controller allowed to write dirty cache lines out at any time
> it
> likes? Surely a better fix is to drain the cache of the changes before
> changing the clock for the SDRAM?

- Previously the SRAM was marked as cached.  _Execution_ using that attribute will generate line fetches to SRAM.  This will cause displacement write-outs of resident DDR lines.  Similarly, _load/store_ sequences in that code have the same effect.  This cast outs dead lock the CPU and it can't fetch to progress.

- In addition to the DVFS path there is a context-restore path which also needs to live in non-cached SRAM.  It will reset and reprogram the registers.  You can't completely re-program the DDR controller from the same memory it is executing from.

* You can go through all the trouble of pre-loading and locking to cache for the same code then unlocking but it adds unnecessary complexity.  It also exposes you to lots of early processor errata.

* Flushing the entire L1 & L2 cache frequently is very expensive and better not done if you don't have.  Also, it is not sufficient for the context-restore path which needs to NOT live in DDR.

The code need to execute in a non-cached region.

Regards,
Richard W.



-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-04 23:40 [PATCH] ARM MMU: add strongly-ordered memory type Paul Walmsley
  2008-08-05 11:49 ` Ben Dooks
@ 2008-08-06  9:53 ` Catalin Marinas
  2008-08-06 12:21   ` Woodruff, Richard
  1 sibling, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2008-08-06  9:53 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-arm-kernel, linux-omap, r-woodruff2

On Mon, 2008-08-04 at 17:40 -0600, Paul Walmsley wrote:
> Add the MT_MEMORY_STRONGLY_ORDERED memory type for ARM strongly ordered
> memory.
> 
> This is used on OMAP3 for on-board SRAM.  On OMAP, SRAM is used for code 
> that changes the SDRAM controller's clock, temporarily blocking access to 
> SDRAM.  During this period, as code executes from SRAM, the ARM cache 
> controller can attempt to write dirty cache lines back to SDRAM to make 
> room for SRAM cache lines, causing the MPU subsystem to hang.  To avoid 
> this, we mark SRAM as strongly- ordered memory.

Why not use normal uncached memory? Strongly ordered is pretty
inefficient as it cannot do any reordering or write buffer merging (it's
like having a memory barrier before and after each instruction).
Speculative accesses are not allowed either. Strongly ordered memory is
not really meant for executing code from.

> +	[MT_MEMORY_STRONGLY_ORDERED] = {
> +		.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE |
> +				PMD_SECT_UNCACHED,

You can add PMD_SECT_TEX(1) for normal uncached memory.

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-05 12:15   ` Woodruff, Richard
@ 2008-08-06 10:20     ` Catalin Marinas
  2008-08-06 12:28       ` Woodruff, Richard
  2008-08-07  6:01       ` Paul Walmsley
  0 siblings, 2 replies; 24+ messages in thread
From: Catalin Marinas @ 2008-08-06 10:20 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Ben Dooks, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Tue, 2008-08-05 at 07:15 -0500, Woodruff, Richard wrote:
> > Is the controller allowed to write dirty cache lines out at any time
> > it
> > likes? Surely a better fix is to drain the cache of the changes before
> > changing the clock for the SDRAM?
> 
> - Previously the SRAM was marked as cached.  _Execution_ using that
> attribute will generate line fetches to SRAM.  This will cause
> displacement write-outs of resident DDR lines.  Similarly,
> _load/store_ sequences in that code have the same effect.  This cast
> outs dead lock the CPU and it can't fetch to progress.
[...]
> * Flushing the entire L1 & L2 cache frequently is very expensive and
> better not done if you don't have.  Also, it is not sufficient for the
> context-restore path which needs to NOT live in DDR.
> 
> The code need to execute in a non-cached region.

I don't think there is any guarantee that dirty cache line won't be
evicted to SDRAM even if your code uses uncached memory only. The CPU is
allowed to do speculative reads from the normal cached memory and these
reads may force a dirty cache line to be written back to memory. You may
need to do at least a cache clean operation (invalidate not necessary).

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-06  9:53 ` Catalin Marinas
@ 2008-08-06 12:21   ` Woodruff, Richard
  2008-08-07  7:30     ` Russell King - ARM Linux
  0 siblings, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-06 12:21 UTC (permalink / raw)
  To: Catalin Marinas, Paul Walmsley
  Cc: linux-arm-kernel@lists.arm.linux.org.uk,
	linux-omap@vger.kernel.org

> Why not use normal uncached memory? Strongly ordered is pretty
> inefficient as it cannot do any reordering or write buffer merging
> (it's
> like having a memory barrier before and after each instruction).
> Speculative accesses are not allowed either. Strongly ordered memory
> is
> not really meant for executing code from.

It could be.  This is a discussion we were having off line.

The code in question is a small bit of assembly interacting with hardware mainly and has not been audited for full pipeline/buffering correctness.

Most of the weak memory attributes in newer ARMs are not exploited today in tree.  I'll guess this was more a correctness and capability judgment from Russell.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-06 10:20     ` Catalin Marinas
@ 2008-08-06 12:28       ` Woodruff, Richard
  2008-08-07 16:55         ` Catalin Marinas
  2008-08-07  6:01       ` Paul Walmsley
  1 sibling, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-06 12:28 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ben Dooks, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

> > The code need to execute in a non-cached region.
>
> I don't think there is any guarantee that dirty cache line won't be
> evicted to SDRAM even if your code uses uncached memory only. The CPU
> is
> allowed to do speculative reads from the normal cached memory and
> these
> reads may force a dirty cache line to be written back to memory. You
> may
> need to do at least a cache clean operation (invalidate not necessary).

I hope that is not necessary.

As I indicated the code in question is small assembly dealing with hardware registers for clock and power.

This code at one point did make sure use barriers at entry which hopefully stop the ARM from being smart.  Also things like speculative access at the AXI were turned off.

And perhaps you even make my point in 'The CPU is allowed to do speculative reads from normal-cached-memory'.
        !! I don't want some speculative branch prediction to fetch from who knows where when I've temporally shut off the memory controller !!

We spent too long fighting bugs around random crashes here already :) In future chips some of this is moving into hardware sequencers out of ARM control.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-06 10:20     ` Catalin Marinas
  2008-08-06 12:28       ` Woodruff, Richard
@ 2008-08-07  6:01       ` Paul Walmsley
  2008-08-07 16:45         ` Catalin Marinas
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Walmsley @ 2008-08-07  6:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Woodruff, Richard, tony, Ben Dooks, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

Hello Catalin,

On Wed, 6 Aug 2008, Catalin Marinas wrote:

> On Tue, 2008-08-05 at 07:15 -0500, Woodruff, Richard wrote:
> > > Is the controller allowed to write dirty cache lines out at any time
> > > it
> > > likes? Surely a better fix is to drain the cache of the changes before
> > > changing the clock for the SDRAM?
> > 
> > - Previously the SRAM was marked as cached.  _Execution_ using that
> > attribute will generate line fetches to SRAM.  This will cause
> > displacement write-outs of resident DDR lines.  Similarly,
> > _load/store_ sequences in that code have the same effect.  This cast
> > outs dead lock the CPU and it can't fetch to progress.
> [...]
> > * Flushing the entire L1 & L2 cache frequently is very expensive and
> > better not done if you don't have.  Also, it is not sufficient for the
> > context-restore path which needs to NOT live in DDR.
> > 
> > The code need to execute in a non-cached region.
> 
> I don't think there is any guarantee that dirty cache line won't be
> evicted to SDRAM even if your code uses uncached memory only. The CPU is
> allowed to do speculative reads from the normal cached memory and these
> reads may force a dirty cache line to be written back to memory. You may
> need to do at least a cache clean operation (invalidate not necessary).

If we turn off speculative reads via a CP15 control register Z-bit write 
for the duration of the SRAM code execution, and use either normal 
non-cached memory or strongly-ordered memory for the SRAM code, will that 
effectively prevent any cache line writeback during that time?  (assuming 
interrupts are disabled, that is).

Also, a somewhat-related question about strongly-ordered memory regions: 
these are described as bracketing accesses to those regions with data 
memory barriers.  In this case, are those barriers specific to the 
strongly-ordered pages, or will they affect all memory transactions, even 
to normal cached memory?


- Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-06 12:21   ` Woodruff, Richard
@ 2008-08-07  7:30     ` Russell King - ARM Linux
  2008-08-07 16:01       ` Catalin Marinas
  2008-08-07 18:56       ` Woodruff, Richard
  0 siblings, 2 replies; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-07  7:30 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Wed, Aug 06, 2008 at 07:21:14AM -0500, Woodruff, Richard wrote:
> Most of the weak memory attributes in newer ARMs are not exploited
> today in tree.  I'll guess this was more a correctness and capability
> judgment from Russell.

Not entirely true.  We do as much as is safe to do - which is basically
using 'device' mappings for devices and ioremap, and 'memory' mappings
for the main memory, module and vmalloc mappings.

What we don't do is mark DMA memory ask being normal uncached memory,
thereby allowing that to be reordered with device accesses - we make
it strongly ordered.  The reason being that the kernel doesn't have
barriers necessary to ensure that writes to DMA memory hit physical
memory before the device access to enable DMA hits the DMA controller.

Those kinds of bugs can be absolute hell to track down - think about
a DMA controller accessing an uninitialised DMA descriptor, resulting
in it scribbing over random bits of memory.

The only real way to do this is to audit lots of drivers to ensure
that:

1. DMA is not started until accesses to memory allocated by
   dma_alloc_coherent() have hit memory
2. accesses to dma_alloc_coherent() memory always read current data, even
   if the DMA controller has just updated the descriptor you're reading.

Linux presently - and quite rightly - assumes that accesses to DMA
coherent memory _are_ coherent with DMA.  If not, the API would be
a joke.

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07  7:30     ` Russell King - ARM Linux
@ 2008-08-07 16:01       ` Catalin Marinas
  2008-08-07 18:56       ` Woodruff, Richard
  1 sibling, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2008-08-07 16:01 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Woodruff, Richard, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, 2008-08-07 at 08:30 +0100, Russell King - ARM Linux wrote:
> What we don't do is mark DMA memory ask being normal uncached memory,
> thereby allowing that to be reordered with device accesses - we make
> it strongly ordered.  The reason being that the kernel doesn't have
> barriers necessary to ensure that writes to DMA memory hit physical
> memory before the device access to enable DMA hits the DMA controller.

We have mb() and related which provides the ordering (there is also
mmiowb() but my understanding is that we don't need this on ARM).

http://lwn.net/Articles/283776/

> Those kinds of bugs can be absolute hell to track down - think about
> a DMA controller accessing an uninitialised DMA descriptor, resulting
> in it scribbing over random bits of memory.

Yes, indeed, but ARM is not the only architecture with a weak memory
ordering model so drivers should be fixed, in theory.

> Linux presently - and quite rightly - assumes that accesses to DMA
> coherent memory _are_ coherent with DMA.  If not, the API would be
> a joke.

As I understand it, the DMA mapping doesn't guarantee any ordering,
drivers must use barriers. According to Documentation/DMA-mapping.txt:

- Consistent DMA mappings which are usually mapped at driver
  initialization, unmapped at the end and for which the hardware should
  guarantee that the device and the CPU can access the data
  in parallel and will see updates made by each other without any
  explicit software flushing.

[...]

  IMPORTANT: Consistent DMA memory does not preclude the usage of
             proper memory barriers.  The CPU may reorder stores to
	     consistent memory just as it may normal memory.

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07  6:01       ` Paul Walmsley
@ 2008-08-07 16:45         ` Catalin Marinas
  2008-08-08  8:45           ` Paul Walmsley
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2008-08-07 16:45 UTC (permalink / raw)
  To: Paul Walmsley
  Cc: Woodruff, Richard, tony, Ben Dooks, linux-omap, linux-arm-kernel

On Thu, 2008-08-07 at 07:01 +0100, Paul Walmsley wrote:
> On Wed, 6 Aug 2008, Catalin Marinas wrote:
> 
> > On Tue, 2008-08-05 at 07:15 -0500, Woodruff, Richard wrote:
> > > > Is the controller allowed to write dirty cache lines out at any time
> > > > it
> > > > likes? Surely a better fix is to drain the cache of the changes before
> > > > changing the clock for the SDRAM?
> > >
> > > - Previously the SRAM was marked as cached.  _Execution_ using that
> > > attribute will generate line fetches to SRAM.  This will cause
> > > displacement write-outs of resident DDR lines.  Similarly,
> > > _load/store_ sequences in that code have the same effect.  This cast
> > > outs dead lock the CPU and it can't fetch to progress.
> > [...]
> > > * Flushing the entire L1 & L2 cache frequently is very expensive and
> > > better not done if you don't have.  Also, it is not sufficient for the
> > > context-restore path which needs to NOT live in DDR.
> > >
> > > The code need to execute in a non-cached region.
> >
> > I don't think there is any guarantee that dirty cache line won't be
> > evicted to SDRAM even if your code uses uncached memory only. The CPU is
> > allowed to do speculative reads from the normal cached memory and these
> > reads may force a dirty cache line to be written back to memory. You may
> > need to do at least a cache clean operation (invalidate not necessary).
> 
> If we turn off speculative reads via a CP15 control register Z-bit write
> for the duration of the SRAM code execution, and use either normal
> non-cached memory or strongly-ordered memory for the SRAM code, will that
> effectively prevent any cache line writeback during that time?  (assuming
> interrupts are disabled, that is).

The Z bit disables the branch prediction but I think prefetches into the
pipeline are still allowed. After the last instruction enabling the
SDRAM controller, you might have to put a number of nops (depending on
the pipeline size) to ensure that you don't get a prefetch beyond your
code, though with branch prediction disabled it shouldn't prefetch from
SDRAM. You probably need additional barriers like ISB and DSB as well.

Many of the architecture people in ARM seem to be on holiday, I'll try
to get clarification in about a week time.

> Also, a somewhat-related question about strongly-ordered memory regions:
> these are described as bracketing accesses to those regions with data
> memory barriers.  In this case, are those barriers specific to the
> strongly-ordered pages, or will they affect all memory transactions, even
> to normal cached memory?

For an observer outside the CPU (and not part of the cache coherency
domain), accesses to strongly-ordered memory are strictly ordered
relative to any uncached memory type. The ordering happens at the
write-buffer level but for normal cached memory a store may end up in
the cache and hence not visible from outside. In this case, you need to
flush the cache followed by a DSB to ensure the termination of the cache
operation.

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-06 12:28       ` Woodruff, Richard
@ 2008-08-07 16:55         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2008-08-07 16:55 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Ben Dooks, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

> On Wed, 2008-08-06 at 07:28 -0500, Woodruff, Richard wrote:
> > > The code need to execute in a non-cached region.
> >
> > I don't think there is any guarantee that dirty cache line won't be
> > evicted to SDRAM even if your code uses uncached memory only. The CPU
> > is
> > allowed to do speculative reads from the normal cached memory and
> > these
> > reads may force a dirty cache line to be written back to memory. You
> > may
> > need to do at least a cache clean operation (invalidate not necessary).
> 
> I hope that is not necessary.

Probably not but need to check internally as well.

> As I indicated the code in question is small assembly dealing with
> hardware registers for clock and power.
> 
> This code at one point did make sure use barriers at entry which
> hopefully stop the ARM from being smart.  Also things like speculative
> access at the AXI were turned off.

You would need a DSB at the beginning to drain the write buffer (among
other things)

> And perhaps you even make my point in 'The CPU is allowed to do
> speculative reads from normal-cached-memory'.
>         !! I don't want some speculative branch prediction to fetch
> from who knows where when I've temporally shut off the memory
> controller !!

If you have a DMB at the end of your code, this ensures that all memory
accesses after the DMB (i.e. accesses to the SDRAM) are visible after
accesses before the barrier (i.e. the enabling of the SDRAM controller).

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07  7:30     ` Russell King - ARM Linux
  2008-08-07 16:01       ` Catalin Marinas
@ 2008-08-07 18:56       ` Woodruff, Richard
  2008-08-07 19:25         ` Russell King - ARM Linux
  1 sibling, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-07 18:56 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Catalin Marinas, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk


> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> > Most of the weak memory attributes in newer ARMs are not exploited
> > today in tree.  I'll guess this was more a correctness and
> capability
> > judgment from Russell.
>
> Not entirely true.  We do as much as is safe to do - which is
> basically
> using 'device' mappings for devices and ioremap, and 'memory' mappings
> for the main memory, module and vmalloc mappings.

Is DEVICE really safe for things other than FIFOs with out the use of barriers?  Accesses are in order but they can be buffered.  The bus is free to post/buffer these writes as long as it preserves the order (and it does). I want to know the write happened before moving on in some code.  Yes some barrier can be added to the code to fix it but its not there in most drivers today.

I recall you gave an explanation on list a while back where MPCORE had to use DEVICE or it wouldn't work.  That restriction is not there for others.

We do in some drivers today get spurious interrupts when DEVICE is used but don't see them when using SO.

Originally the IC-Architect wanted two memory windows per device, one SO for register control and one DEVICE for FIFO access.  Given that we do DMA (which doesn't care about how ARM sees the world) on the performance hungry devices not doing this was ok.

> What we don't do is mark DMA memory ask being normal uncached memory,
> thereby allowing that to be reordered with device accesses - we make
> it strongly ordered.  The reason being that the kernel doesn't have
> barriers necessary to ensure that writes to DMA memory hit physical
> memory before the device access to enable DMA hits the DMA controller.

For an experiment a couple years back we did convert the dma alloc pool addresses as NC.  All worked -except- for OHCI-USB which started failing some tests.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 18:56       ` Woodruff, Richard
@ 2008-08-07 19:25         ` Russell King - ARM Linux
  2008-08-07 20:38           ` Woodruff, Richard
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-07 19:25 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Catalin Marinas, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, Aug 07, 2008 at 01:56:40PM -0500, Woodruff, Richard wrote:
> > From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> > > Most of the weak memory attributes in newer ARMs are not exploited
> > > today in tree.  I'll guess this was more a correctness and capability
> > > judgment from Russell.
> >
> > Not entirely true.  We do as much as is safe to do - which is
> > basically
> > using 'device' mappings for devices and ioremap, and 'memory' mappings
> > for the main memory, module and vmalloc mappings.
> 
> Is DEVICE really safe for things other than FIFOs with out the use of
> barriers?

As far as I'm aware, yes - and that comment is based solely upon the
fact that no one has reported any problems with the kernel which have
been tracked down to using the device memory type on ARMv6 and above...

> We do in some drivers today get spurious interrupts when DEVICE is
> used but don't see them when using SO.

... until now, or even that very sentence.

That's not unexpected if you don't have the right barriers in place
at the end of things such as IRQ controllers ack/mask functions.

Can you give me more information - which OMAP platform, which IRQ
controller, which device is easiest to provoke this behaviour, and
I'll look at it.

> Originally the IC-Architect wanted two memory windows per device, one
> SO for register control and one DEVICE for FIFO access.  Given that we
> do DMA (which doesn't care about how ARM sees the world) on the
> performance hungry devices not doing this was ok.

I'm not sure what point you're making there.

> For an experiment a couple years back we did convert the dma alloc
> pool addresses as NC.  All worked -except- for OHCI-USB which started
> failing some tests.

If we go down the route of marking DMA as 'normal memory non-cacheable'
we're going to have a never ending stream of drivers which don't work
correctly.  We're forever going to be bug hunting drivers, having to
add barriers as required.  Arguably those barriers should be there
already, but if drivers are developed on platforms without weak ordering,
authors just don't think about it, and _certainly_ can't test them.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 19:25         ` Russell King - ARM Linux
@ 2008-08-07 20:38           ` Woodruff, Richard
  2008-08-07 21:20             ` Russell King - ARM Linux
  0 siblings, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-07 20:38 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Catalin Marinas, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk


> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> > Is DEVICE really safe for things other than FIFOs with out the use of
> > barriers?
>
> As far as I'm aware, yes - and that comment is based solely upon the
> fact that no one has reported any problems with the kernel which have
> been tracked down to using the device memory type on ARMv6 and above...
>
> > We do in some drivers today get spurious interrupts when DEVICE is
> > used but don't see them when using SO.
>
> ... until now, or even that very sentence.

That is our fault then I suppose for not discussing this on arm-linux. In OMAP2 and OMAP3 this has been observed.  In vendor kernels where time stands still and lots of validation has happened we did stick with SO for OMAP2.  On some internal kernels already we have gone to SO for OMAP3 as customers ramp and need the errors gone.  The faster the system clocks the more it seems to show.

The thing with these effects, especially spurious IRQs is there usually are several reasons they show up and several ways to make them go away.  In the beginning there have been lots then they drop off as the system software matures.  Then if the program survives long enough to be optimized they start to show up again but in lesser numbers.  This has been the OMAP2/3 experience so far.  Going SO to control regions has stamped them out at this point.

> That's not unexpected if you don't have the right barriers in place
> at the end of things such as IRQ controllers ack/mask functions.

Yes. I've submitted patches (to linux-omap) and Catalin did submit patches (to arm-linux) for PIC barriers.  In the past they have been rejected by Tony or you for different reasons.  Tony last rejected it because he thought it should be generic at the ARM level.  I don't recall what your last stance was.

It is consistently observed, that with these irq-controller barriers in place, spurious irqs go down (but not necessarily away).

Our internal kernels still have this in them for OMAP2 and OMAP3.

> Can you give me more information - which OMAP platform, which IRQ
> controller, which device is easiest to provoke this behaviour, and
> I'll look at it.

Lately, a full OMAP3 system running with the 3d-GFX driver is causing these. Camera driver operation has been one which raised them on and off.  If it persists exporting an environment to you should be possible.  I expect it will take time and coordination to do this.

The pure linux-omap kernel has periodically seen spurious irqs with UART.  However, if you use the irq-controller barriers they tend to go away.

> > Originally the IC-Architect wanted two memory windows per device, one
> > SO for register control and one DEVICE for FIFO access.  Given that we
> > do DMA (which doesn't care about how ARM sees the world) on the
> > performance hungry devices not doing this was ok.
>
> I'm not sure what point you're making there.

Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping to write to registers of that device.  Then when you go to write to its FIFO use a DEVICE mapping.

Say TX IRQ happens at UART, I might check status bits through a SO mapping, but when it comes time to fill the FIFO I write to the DEVICE mapping.  Maybe I can even arrange it such that I burst in order using the natural FIFO width.  Even if you don't burst you can take advantage of the bus posting effects.  Fill the FIFO and get out of there with out a big stall time.

Like I said previously, a system likely will use DMA to the FIFO if performance matters, so not optimizing here has been the choice.

> > For an experiment a couple years back we did convert the dma alloc
> > pool addresses as NC.  All worked -except- for OHCI-USB which started
> > failing some tests.
>
> If we go down the route of marking DMA as 'normal memory non-cacheable'
> we're going to have a never ending stream of drivers which don't work
> correctly.  We're forever going to be bug hunting drivers, having to
> add barriers as required.  Arguably those barriers should be there
> already, but if drivers are developed on platforms without weak ordering,
> authors just don't think about it, and _certainly_ can't test them.

Is this just the case for an attribute to be made available from an API change/addition to allow a driver to make use of it?  The default can always be conservative.

The trend is ARMs are depending more on pipeline and prefetch tricks to perform. For these tricks to work weak memory features need to be used at times.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 20:38           ` Woodruff, Richard
@ 2008-08-07 21:20             ` Russell King - ARM Linux
  2008-08-07 21:59               ` Russell King - ARM Linux
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-07 21:20 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Catalin Marinas, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, Aug 07, 2008 at 03:38:55PM -0500, Woodruff, Richard wrote:
> 
> > From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> > > Is DEVICE really safe for things other than FIFOs with out the use of
> > > barriers?
> >
> > As far as I'm aware, yes - and that comment is based solely upon the
> > fact that no one has reported any problems with the kernel which have
> > been tracked down to using the device memory type on ARMv6 and above...
> >
> > > We do in some drivers today get spurious interrupts when DEVICE is
> > > used but don't see them when using SO.
> >
> > ... until now, or even that very sentence.
> 
> That is our fault then I suppose for not discussing this on arm-linux.
> In OMAP2 and OMAP3 this has been observed.  In vendor kernels where
> time stands still and lots of validation has happened we did stick
> with SO for OMAP2.  On some internal kernels already we have gone to
> SO for OMAP3 as customers ramp and need the errors gone.  The faster
> the system clocks the more it seems to show.

To do that, and then ask about when Linux is going to start exploiting
the weak memory types is a little unfair don't you think?

> The thing with these effects, especially spurious IRQs is there usually
> are several reasons they show up and several ways to make them go away.
> In the beginning there have been lots then they drop off as the system
> software matures.  Then if the program survives long enough to be
> optimized they start to show up again but in lesser numbers.  This has
> been the OMAP2/3 experience so far.  Going SO to control regions has
> stamped them out at this point.

What you're therefore asking for is a weak memory ordering model which
doesn't require any effort on the software programmers part - that's
a CPU architecture thing which you'll need to talk to ARM about.

x86 can do this for the most part because x86's development has been
such that the hardware has had to work around the software to make
improvements.  On ARM, normally when there's updates, software has
to work around the hardware.

> > That's not unexpected if you don't have the right barriers in place
> > at the end of things such as IRQ controllers ack/mask functions.
> 
> Yes. I've submitted patches (to linux-omap) and Catalin did submit
> patches (to arm-linux) for PIC barriers.  In the past they have been
> rejected by Tony or you for different reasons.  Tony last rejected
> it because he thought it should be generic at the ARM level.  I
> don't recall what your last stance was.

Looking back, I never commented on that patch.  I did on the previous
patch which was adding DSBs in a way which would break stuff.  The
patch to add them to the interrupt controllers has never been reposted.

However, adding barriers may not be the correct answer for this.
See Documentation/io_ordering.txt - reading back from a safe register
on the target device ensures that the previous writes should hit the
device before the read completes, without the overhead of a full
barrier.

This point is even more important if you have some form of write
posting between the CPU and the device (eg, a PCI bus) - a DSB
won't reach down to the target PCI device which may be behind some
write-posting bridges.

So, in the case of arch/arm/common/gic.c, we should be reading one of
the gic control registers after the writes.  In the case of
arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
should be a sufficient solution.

But, not a barrier.

> However, if you use the irq-controller barriers they tend to go away.

Great, so solving that should prevent them.

> > > Originally the IC-Architect wanted two memory windows per device, one
> > > SO for register control and one DEVICE for FIFO access.  Given that we
> > > do DMA (which doesn't care about how ARM sees the world) on the
> > > performance hungry devices not doing this was ok.
> >
> > I'm not sure what point you're making there.
> 
> Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping
> to write to registers of that device.  Then when you go to write to its
> FIFO use a DEVICE mapping.

I believe ARMv7 has some restrictions on dual mapping of the same
space with different types, so don't expect this technique to always
work.

> Say TX IRQ happens at UART, I might check status bits through a SO mapping,
> but when it comes time to fill the FIFO I write to the DEVICE mapping.

Why?  Firstly, the read _has_ to complete before the program can
continue.  (If it hasn't completed, you don't have the data to decide
what to do next.)  Secondly, any previous device writes will have to
complete before the read completes.

So what does reading the status bits through a SO mapping gain you?
The answer is, all other reads and writes previously issued by the
program completing.  Does that affect the status that the UART is
giving you?

> > > For an experiment a couple years back we did convert the dma alloc
> > > pool addresses as NC.  All worked -except- for OHCI-USB which started
> > > failing some tests.
> >
> > If we go down the route of marking DMA as 'normal memory non-cacheable'
> > we're going to have a never ending stream of drivers which don't work
> > correctly.  We're forever going to be bug hunting drivers, having to
> > add barriers as required.  Arguably those barriers should be there
> > already, but if drivers are developed on platforms without weak ordering,
> > authors just don't think about it, and _certainly_ can't test them.
> 
> Is this just the case for an attribute to be made available from an
> API change/addition to allow a driver to make use of it?  The default
> can always be conservative.
> 
> The trend is ARMs are depending more on pipeline and prefetch tricks
> to perform. For these tricks to work weak memory features need to be
> used at times.

My preference at the moment is "we'll see, lets sort out the problems
we know about first".

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 21:20             ` Russell King - ARM Linux
@ 2008-08-07 21:59               ` Russell King - ARM Linux
  2008-08-07 23:07               ` Woodruff, Richard
  2008-08-08 11:44               ` Catalin Marinas
  2 siblings, 0 replies; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-07 21:59 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, Aug 07, 2008 at 10:20:33PM +0100, Russell King - ARM Linux wrote:
> In the case of arch/arm/mach-omap2/irq.c, reading the INTC_REVISION
> reg after masking should be a sufficient solution.

And here's a patch to do exactly that.

diff --git a/arch/arm/mach-omap2/irq.c b/arch/arm/mach-omap2/irq.c
index 9ef15b3..27610b1 100644
--- a/arch/arm/mach-omap2/irq.c
+++ b/arch/arm/mach-omap2/irq.c
@@ -48,6 +48,7 @@ static struct omap_irq_bank {
 static void omap_ack_irq(unsigned int irq)
 {
 	__raw_writel(0x1, irq_banks[0].base_reg + INTC_CONTROL);
+	__raw_readl(irq_banks[0].base_reg + INTC_REVISION);
 }
 
 static void omap_mask_irq(unsigned int irq)
@@ -61,6 +62,7 @@ static void omap_mask_irq(unsigned int irq)
 	}
 
 	__raw_writel(1 << irq, irq_banks[0].base_reg + INTC_MIR_SET0 + offset);
+	__raw_readl(irq_banks[0].base_reg + INTC_REVISION);
 }
 
 static void omap_unmask_irq(unsigned int irq)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 21:20             ` Russell King - ARM Linux
  2008-08-07 21:59               ` Russell King - ARM Linux
@ 2008-08-07 23:07               ` Woodruff, Richard
  2008-08-08  7:16                 ` Russell King - ARM Linux
  2008-08-08 11:44               ` Catalin Marinas
  2 siblings, 1 reply; 24+ messages in thread
From: Woodruff, Richard @ 2008-08-07 23:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]

<snip> - sorry for cutting of quality text.  If that is bad practice let me know and I'll not do it.

> What you're therefore asking for is a weak memory ordering model which
> doesn't require any effort on the software programmers part - that's
> a CPU architecture thing which you'll need to talk to ARM about.

I don't know that I'm asking.  But yes it would be great to have this all for free.  Winning the lottery would be good also :)

Yes, the architecture is a major part of the coding complexity.

> x86 can do this for the most part because x86's development has been
> such that the hardware has had to work around the software to make
> improvements.  On ARM, normally when there's updates, software has
> to work around the hardware.

Yes, I agree with your statements on x86.

PowerPC has had a weak model for a long time with these kinds of restrictions and it has done ok.  Many things ARM or x86 are doing today have existed in PPC long ago.

A discussion on merits of the weak memory model and peoples observations does seem fair.  Without public discussion and actual work on arm-linux and kernel.org I don't see how anyone would reasonably expect it in the common code.  In the past TI hasn't done so much but the plan is to do more of it going forward.

<snip>

> So, in the case of arch/arm/common/gic.c, we should be reading one of
> the gic control registers after the writes.  In the case of
> arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
> should be a sufficient solution.

Yes reading back a register from the same level is a valid method.  It may be sufficient here.

> But, not a barrier.

As I recall the original guidance from ARM (for irq-cntrl sync), was they didn't recommend using a sync instruction at the PIC as that would always block.  They recommended some sync instruction near CPSR changes which touched the irq mask bits.  This way the impact might be not always felt and would be folded in.

The cpu-memory-port is synchronized with the pipeline but not the cpu-peripheral-port (where pic was at and there is a write buffer) so the explicit sync was needed. You could unmask IRQs at the ARM before the PIC operation had completed otherwise.

There are a few sync primitives today barrier might not be the right word for this situation.

Our placement choice was mainly to keep code changes down.  Aggressive placement seemed more in the domain for people who care about tight real time constraints.

> I believe ARMv7 has some restrictions on dual mapping of the same
> space with different types, so don't expect this technique to always
> work.

Sounds familiar but I don't recall it.

As were not planning on using dual maps there isn't much consideration.  In this context it just helps illustrate weak memory behavior at devices.

> > Say TX IRQ happens at UART, I might check status bits through a SO
> mapping,
> > but when it comes time to fill the FIFO I write to the DEVICE mapping.
>
> Why?  Firstly, the read _has_ to complete before the program can
> continue.  (If it hasn't completed, you don't have the data to decide
> what to do next.)  Secondly, any previous device writes will have to
> complete before the read completes.
>
> So what does reading the status bits through a SO mapping gain you?
> The answer is, all other reads and writes previously issued by the
> program completing.  Does that affect the status that the UART is
> giving you?

Sorry, I'm missing something.

I thought the main point was you could load and unload your FIFO faster by taking advantage of bus posting & bursting if the FIFO was mapped as DEVICE when using PIO.

Control register access before you start to deal with the FIFO will always be synchronizing and you want it this way (when mixing reads and writes).

If I write a series of control register commands to device A, then write a go operation to device  B, I would hope all of A's writes had completed before B gets the go.  SO gives you this.  DEVICE may not with out barriers.

Regards,
Richard W.


-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 23:07               ` Woodruff, Richard
@ 2008-08-08  7:16                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-08  7:16 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Catalin Marinas, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, Aug 07, 2008 at 06:07:50PM -0500, Woodruff, Richard wrote:
> If I write a series of control register commands to device A, then
> write a go operation to device  B, I would hope all of A's writes
> had completed before B gets the go.  SO gives you this.  DEVICE may
> not with out barriers.

This, I think, is where the problem lies.

Device regions of the same type *are* ordered with respect to each other.

So, shared device accesses occur in program order.  Unshared device
accesses occur in program order.

However, shared device accesses may occur out of order with unshared
device accesses or memory accesses.  Unshared device accesses may
occur out of order with shared device accesses or memory accesses.

So, if both device A and device B are mapped as shared devices, then
accesses to both occur in program order.

If device A is mapped as a shared device and device B as an unshared
device, then you have to use read backs and possibly barriers to
ensure ordering.  Remember, a barrier only affects up to the CPU.
It doesn't affect write posting downstream of the CPU.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 16:45         ` Catalin Marinas
@ 2008-08-08  8:45           ` Paul Walmsley
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Walmsley @ 2008-08-08  8:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Woodruff, Richard, tony, Ben Dooks, linux-omap, linux-arm-kernel

On Thu, 7 Aug 2008, Catalin Marinas wrote:

> Many of the architecture people in ARM seem to be on holiday, I'll try
> to get clarification in about a week time.

That would be really great.  For reference, here is the assembly code in 
question:

    http://www.mail-archive.com/linux-omap@vger.kernel.org/msg01349.html

It's the code following the omap34xx_sram_configure_core_dpll entry point.


- Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-07 21:20             ` Russell King - ARM Linux
  2008-08-07 21:59               ` Russell King - ARM Linux
  2008-08-07 23:07               ` Woodruff, Richard
@ 2008-08-08 11:44               ` Catalin Marinas
  2008-08-08 13:19                 ` Russell King - ARM Linux
  2008-08-11  7:50                 ` Paul Walmsley
  2 siblings, 2 replies; 24+ messages in thread
From: Catalin Marinas @ 2008-08-08 11:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Woodruff, Richard, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Thu, 2008-08-07 at 22:20 +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 07, 2008 at 03:38:55PM -0500, Woodruff, Richard wrote:
> > 
> > > From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> > > > Is DEVICE really safe for things other than FIFOs with out the use of
> > > > barriers?
> > >
> > > As far as I'm aware, yes - and that comment is based solely upon the
> > > fact that no one has reported any problems with the kernel which have
> > > been tracked down to using the device memory type on ARMv6 and above...
> > >
> > > > We do in some drivers today get spurious interrupts when DEVICE is
> > > > used but don't see them when using SO.
> > >
> > > ... until now, or even that very sentence.
> > 
> > That is our fault then I suppose for not discussing this on arm-linux.
> > In OMAP2 and OMAP3 this has been observed.  In vendor kernels where
> > time stands still and lots of validation has happened we did stick
> > with SO for OMAP2.  On some internal kernels already we have gone to
> > SO for OMAP3 as customers ramp and need the errors gone.  The faster
> > the system clocks the more it seems to show.
> 
> To do that, and then ask about when Linux is going to start exploiting
> the weak memory types is a little unfair don't you think?

There are already CPUs with weaker memory ordering model than ARM (e.g.
Alpha) and they are supported by Linux. Of course, there may be problems
with drivers since most of them are developed in x86.

On the strongly-ordered instead of normal uncached memory, the unaligned
accesses to SO memory which are not faulted have unpredictable behaviour
(according to the ARM ARM, though some v7 implementations may not be
bothered). If you use such memory for skbuff for example, is there any
risk of unaligned accesses when the network packets are processed? Is
there any other example that would make this fail?

> > The thing with these effects, especially spurious IRQs is there usually
> > are several reasons they show up and several ways to make them go away.
> > In the beginning there have been lots then they drop off as the system
> > software matures.  Then if the program survives long enough to be
> > optimized they start to show up again but in lesser numbers.  This has
> > been the OMAP2/3 experience so far.  Going SO to control regions has
> > stamped them out at this point.
> 
> What you're therefore asking for is a weak memory ordering model which
> doesn't require any effort on the software programmers part - that's
> a CPU architecture thing which you'll need to talk to ARM about.
> 
> x86 can do this for the most part because x86's development has been
> such that the hardware has had to work around the software to make
> improvements.  On ARM, normally when there's updates, software has
> to work around the hardware.

For ARM CPU (RISC architecture) to get faster while keeping the power
consumption low, it must become weakly ordered (maybe CISC architectures
can cope with this). Anyway, it seems that ia64 requires some barriers
as well. That's more like evolution in the CPU field and while the
software becomes more complex, the overall performance is better.

> > > That's not unexpected if you don't have the right barriers in place
> > > at the end of things such as IRQ controllers ack/mask functions.
> > 
> > Yes. I've submitted patches (to linux-omap) and Catalin did submit
> > patches (to arm-linux) for PIC barriers.  In the past they have been
> > rejected by Tony or you for different reasons.  Tony last rejected
> > it because he thought it should be generic at the ARM level.  I
> > don't recall what your last stance was.
> 
> Looking back, I never commented on that patch.  I did on the previous
> patch which was adding DSBs in a way which would break stuff.  The
> patch to add them to the interrupt controllers has never been reposted.
> 
> However, adding barriers may not be the correct answer for this.
> See Documentation/io_ordering.txt - reading back from a safe register
> on the target device ensures that the previous writes should hit the
> device before the read completes, without the overhead of a full
> barrier.
> 
> This point is even more important if you have some form of write
> posting between the CPU and the device (eg, a PCI bus) - a DSB
> won't reach down to the target PCI device which may be behind some
> write-posting bridges.
> 
> So, in the case of arch/arm/common/gic.c, we should be reading one of
> the gic control registers after the writes.  In the case of
> arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
> should be a sufficient solution.

I need to check in ARM when people come from holidays but a simple LDR
might not be enough to guarantee that a CPSIE etc. happens after it. You
may need to add either an LDR + CMP (or some other usage of the loaded
register) or LDR + DSB. I agree that DSB alone is not enough.

> > Use a dual mapping to manage a device (2 ioremaps).  You use a SO mapping
> > to write to registers of that device.  Then when you go to write to its
> > FIFO use a DEVICE mapping.
> 
> I believe ARMv7 has some restrictions on dual mapping of the same
> space with different types, so don't expect this technique to always
> work.

This setup may lead to unpredictable behaviour if not used properly. I
think it is allowed as long as accesses to these mappings are separated
by a DSB or return from / entering an exception.

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-08 11:44               ` Catalin Marinas
@ 2008-08-08 13:19                 ` Russell King - ARM Linux
  2008-08-08 16:40                   ` Catalin Marinas
  2008-08-11  7:50                 ` Paul Walmsley
  1 sibling, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2008-08-08 13:19 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Woodruff, Richard, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Fri, Aug 08, 2008 at 12:44:49PM +0100, Catalin Marinas wrote:
> There are already CPUs with weaker memory ordering model than ARM (e.g.
> Alpha) and they are supported by Linux. Of course, there may be problems
> with drivers since most of them are developed in x86.

There are, and they are _constantly_ complaining about drivers not
having the necessary barriers.

Consider that for a moment - how long has Linux supported had weakly
ordered architectures, and how long does it take to fix ordering
problems... 10 or so years?

> > So, in the case of arch/arm/common/gic.c, we should be reading one of
> > the gic control registers after the writes.  In the case of
> > arch/arm/mach-omap2/irq.c, reading the INTC_REVISION reg after masking
> > should be a sufficient solution.
> 
> I need to check in ARM when people come from holidays but a simple LDR
> might not be enough to guarantee that a CPSIE etc. happens after it. You
> may need to add either an LDR + CMP (or some other usage of the loaded
> register) or LDR + DSB. I agree that DSB alone is not enough.

Okay, I give up on this issue.  Weak memory ordering seems to be a
very very big can of worms.  And then there's this:

14:07 < rmk> so we're back to making readl() itself do something with the
             data... which brings us back to that question about why bother
             with weak ordering
14:10 < willy> you can't have weak ordering for device control registers
14:11 < rmk> yes you can, provided they're ordered wrt each other.
14:11 < willy> weak ordering works great for SMP or for just covering up latency
14:12 < willy> no, you can't.  see writel(); readl(); udelay(1); writel();.
               You didn't wait for 1 microsecond before accessing the device
               again.

Or, to put it another way, it seems that on Linux _all_ devices must
be strongly ordered or be seen to Linux as being strongly ordered
(iow, readl and writel and friends _must_ have a barrier.)

And of course, putting barriers into readl and writel, we might as well
use strongly ordered mappings anyway, because that'll save us a few
bytes of program memory.

TBH, this is becoming soo much of a joke, it's untrue.

Let's go back to having a strongly ordered memory model.  Please.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-08 13:19                 ` Russell King - ARM Linux
@ 2008-08-08 16:40                   ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2008-08-08 16:40 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Woodruff, Richard, Paul Walmsley, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Fri, 2008-08-08 at 14:19 +0100, Russell King - ARM Linux wrote:
> On Fri, Aug 08, 2008 at 12:44:49PM +0100, Catalin Marinas wrote:
> > I need to check in ARM when people come from holidays but a simple LDR
> > might not be enough to guarantee that a CPSIE etc. happens after it. You
> > may need to add either an LDR + CMP (or some other usage of the loaded
> > register) or LDR + DSB. I agree that DSB alone is not enough.
> 
> Okay, I give up on this issue.  Weak memory ordering seems to be a
> very very big can of worms.  And then there's this:
> 
> 14:07 < rmk> so we're back to making readl() itself do something with the
>              data... which brings us back to that question about why bother
>              with weak ordering
> 14:10 < willy> you can't have weak ordering for device control registers
> 14:11 < rmk> yes you can, provided they're ordered wrt each other.
> 14:11 < willy> weak ordering works great for SMP or for just covering up latency
> 14:12 < willy> no, you can't.  see writel(); readl(); udelay(1); writel();.
>                You didn't wait for 1 microsecond before accessing the device
>                again.
> 
> Or, to put it another way, it seems that on Linux _all_ devices must
> be strongly ordered or be seen to Linux as being strongly ordered
> (iow, readl and writel and friends _must_ have a barrier.)

Why not put a barrier in udelay? The delay functions are mainly used for
device access.

> TBH, this is becoming soo much of a joke, it's untrue.

For performance in a multi-stage pipeline model, you need weaker
ordering. I'm not a hardware engineer to comment more on this.

> Let's go back to having a strongly ordered memory model.  Please.

On pre-ARMv6, updates to CPSR are guaranteed to take place in program
order with a strongly-ordered memory access. This is still true on
ARMv6/v7 for backward compatibility but the feature is deprecated and it
might not be true for future architecture versions. At some point
barriers might be needed.

-- 
Catalin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] ARM MMU: add strongly-ordered memory type
  2008-08-08 11:44               ` Catalin Marinas
  2008-08-08 13:19                 ` Russell King - ARM Linux
@ 2008-08-11  7:50                 ` Paul Walmsley
  1 sibling, 0 replies; 24+ messages in thread
From: Paul Walmsley @ 2008-08-11  7:50 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Russell King - ARM Linux, Woodruff, Richard,
	linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk

On Fri, 8 Aug 2008, Catalin Marinas wrote:

> There are already CPUs with weaker memory ordering model than ARM (e.g.
> Alpha) and they are supported by Linux. Of course, there may be problems
> with drivers since most of them are developed in x86.

For the OMAP SoC, most of the drivers are specific to ARM.  The devices 
just aren't available for x86 or other architectures.  (With a few 
exceptions - smc91x and MUSB are the two that come to mind.)


- Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-08-11  7:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-04 23:40 [PATCH] ARM MMU: add strongly-ordered memory type Paul Walmsley
2008-08-05 11:49 ` Ben Dooks
2008-08-05 12:15   ` Woodruff, Richard
2008-08-06 10:20     ` Catalin Marinas
2008-08-06 12:28       ` Woodruff, Richard
2008-08-07 16:55         ` Catalin Marinas
2008-08-07  6:01       ` Paul Walmsley
2008-08-07 16:45         ` Catalin Marinas
2008-08-08  8:45           ` Paul Walmsley
2008-08-06  9:53 ` Catalin Marinas
2008-08-06 12:21   ` Woodruff, Richard
2008-08-07  7:30     ` Russell King - ARM Linux
2008-08-07 16:01       ` Catalin Marinas
2008-08-07 18:56       ` Woodruff, Richard
2008-08-07 19:25         ` Russell King - ARM Linux
2008-08-07 20:38           ` Woodruff, Richard
2008-08-07 21:20             ` Russell King - ARM Linux
2008-08-07 21:59               ` Russell King - ARM Linux
2008-08-07 23:07               ` Woodruff, Richard
2008-08-08  7:16                 ` Russell King - ARM Linux
2008-08-08 11:44               ` Catalin Marinas
2008-08-08 13:19                 ` Russell King - ARM Linux
2008-08-08 16:40                   ` Catalin Marinas
2008-08-11  7:50                 ` Paul Walmsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox