[RFC] Kernel semantics of relaxed MMIO accessors

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Kernel semantics of relaxed MMIO accessors
@ 2013-09-09 11:44 Will Deacon
  2013-09-17 11:32 ` Will Deacon
  0 siblings, 1 reply; 2+ messages in thread
From: Will Deacon @ 2013-09-09 11:44 UTC (permalink / raw)
  To: linux-arch
  Cc: benh, linux, catalin.marinas, x86, jgunthorpe, gregory.clement,
	ezequiel.garcia, JBottomley, npiggin

Hello,

During the review of a recent patch to add support for atomic MMIO
read-modify-write sequences between drivers on ARM, it was suggested
that this code could be made generic and used by other architectures.

  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-August/194178.html

However, making this generic requires the availability of relaxed MMIO
accessors across all architectures because { readX(); modify(); writeX(); }
is an extremely expensive sequence on ARM. This expense is due to heavyweight
barriers inside our accessor macros to satisfy the conclusions from this
earlier thread with respect to cacheable memory ordering (which do make sense
from a driver writer's perspective):

  http://www.gossamer-threads.com/lists/linux/kernel/932153?do=post_view_threaded#932153

The problem with relaxed accessors (which is also mentioned in the thread
above) is that they don't seem to have well defined semantics across all
architectures. For example, the table below illustrates a few architectures
and their behaviour in this area (please correct any mistakes or add any
interesting architectures):

Ordered against: | IO (same device) | Cacheable accesses | Spin lock/unlock |
-----------------+------------------+--------------------+------------------+
ARM/ARM64        |                  |                    |                  |
  readX/writeX   |        Y         |         Y          |        Y         |
  _relaxed       |        Y         |         N          |        Y         |
                 |                  |                    |                  |
Alpha            |                  |                    |                  |
  readX/writeX   |        Y         |         Y          |        Y         |
  _relaxed       |        N*        |         N          |        Y         |
                 |                  |                    |                  |
PowerPC**        |                  |                    |                  |
  readX/writeX   |        Y         |         Y          |        Y         |
  _relaxed       |        Y         |         Y          |        Y         |
                 |                  |                    |                  |
x86              |                  |                    |                  |
  readX/writeX   |        Y         |         Y          |        Y         |
  _relaxed***    |        N         |         N          |        Y         |

*   Depends on specific machine afaict.
**  _relaxed accessors just #defined as non-relaxed variants, so could be
    improved.
*** Potential for re-ordering by the compiler.

On top of that, there is the concept of relaxed transactions in PCI-X and
PCI-E, which seem to permit re-ordering of accesses to the same address!
I think this is also behind the reason that, whilst readX_relaxed is
implemented on almost all architectures, writeX_relaxed is very uncommon.

Documentation/memory-barriers.txt states vaguely that readX_relaxed is
"not guaranteed to be ordered in any way" whilst
Documentation/DocBook/deviceiobook.tmpl explicitly ties the relaxed ordering
to IO accesses and DMA writes from a device.

So this email is a bit of a cry for help. I'd like to try and define some
common semantics for relaxed I/O accessors so that they can be implemented
by all architectures and relied upon by driver writers, including the
addition of relaxed writes.

My basic proposal would be to copy the ARM definition of _relaxed accessors
(i.e. only relax ordering against cacheable accesses), which is the semantic
hinted at by Nick when this was last discussed:

  http://www.gossamer-threads.com/lists/linux/kernel/932390?do=post_view_threaded#932390

This should allow for significant performance improvements in drivers which
don't care about normal memory ordering most of the time yet do have strict
requirements on ordering of I/O accesses (I think this is the common case).

All feedback/suggestions/war stories welcome!

Will

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] Kernel semantics of relaxed MMIO accessors
  2013-09-09 11:44 [RFC] Kernel semantics of relaxed MMIO accessors Will Deacon
@ 2013-09-17 11:32 ` Will Deacon
  0 siblings, 0 replies; 2+ messages in thread
From: Will Deacon @ 2013-09-17 11:32 UTC (permalink / raw)
  To: linux-arch@vger.kernel.org
  Cc: benh@kernel.crashing.org, linux@arm.linux.org.uk, Catalin Marinas,
	x86@kernel.org, jgunthorpe@obsidianresearch.com,
	gregory.clement@free-electrons.com,
	ezequiel.garcia@free-electrons.com, JBottomley@Parallels.com,
	npiggin@kernel.dk, davem, linux-kernel

[expanding CC list and bumping since the merge window is now over]

On Mon, Sep 09, 2013 at 12:44:49PM +0100, Will Deacon wrote:
> Hello,
> 
> During the review of a recent patch to add support for atomic MMIO
> read-modify-write sequences between drivers on ARM, it was suggested
> that this code could be made generic and used by other architectures.
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2013-August/194178.html
> 
> However, making this generic requires the availability of relaxed MMIO
> accessors across all architectures because { readX(); modify(); writeX(); }
> is an extremely expensive sequence on ARM. This expense is due to heavyweight
> barriers inside our accessor macros to satisfy the conclusions from this
> earlier thread with respect to cacheable memory ordering (which do make sense
> from a driver writer's perspective):
> 
>   http://www.gossamer-threads.com/lists/linux/kernel/932153?do=post_view_threaded#932153
> 
> The problem with relaxed accessors (which is also mentioned in the thread
> above) is that they don't seem to have well defined semantics across all
> architectures. For example, the table below illustrates a few architectures
> and their behaviour in this area (please correct any mistakes or add any
> interesting architectures):
> 
> 
> Ordered against: | IO (same device) | Cacheable accesses | Spin lock/unlock |
> -----------------+------------------+--------------------+------------------+
> ARM/ARM64        |                  |                    |                  |
>   readX/writeX   |        Y         |         Y          |        Y         |
>   _relaxed       |        Y         |         N          |        Y         |
>                  |                  |                    |                  |
> Alpha            |                  |                    |                  |
>   readX/writeX   |        Y         |         Y          |        Y         |
>   _relaxed       |        N*        |         N          |        Y         |
>                  |                  |                    |                  |
> PowerPC**        |                  |                    |                  |
>   readX/writeX   |        Y         |         Y          |        Y         |
>   _relaxed       |        Y         |         Y          |        Y         |
>                  |                  |                    |                  |
> x86              |                  |                    |                  |
>   readX/writeX   |        Y         |         Y          |        Y         |
>   _relaxed***    |        N         |         N          |        Y         |
> 
> *   Depends on specific machine afaict.
> **  _relaxed accessors just #defined as non-relaxed variants, so could be
>     improved.
> *** Potential for re-ordering by the compiler.
> 
> 
> On top of that, there is the concept of relaxed transactions in PCI-X and
> PCI-E, which seem to permit re-ordering of accesses to the same address!
> I think this is also behind the reason that, whilst readX_relaxed is
> implemented on almost all architectures, writeX_relaxed is very uncommon.
> 
> Documentation/memory-barriers.txt states vaguely that readX_relaxed is
> "not guaranteed to be ordered in any way" whilst
> Documentation/DocBook/deviceiobook.tmpl explicitly ties the relaxed ordering
> to IO accesses and DMA writes from a device.
> 
> So this email is a bit of a cry for help. I'd like to try and define some
> common semantics for relaxed I/O accessors so that they can be implemented
> by all architectures and relied upon by driver writers, including the
> addition of relaxed writes.
> 
> My basic proposal would be to copy the ARM definition of _relaxed accessors
> (i.e. only relax ordering against cacheable accesses), which is the semantic
> hinted at by Nick when this was last discussed:
> 
>   http://www.gossamer-threads.com/lists/linux/kernel/932390?do=post_view_threaded#932390
> 
> This should allow for significant performance improvements in drivers which
> don't care about normal memory ordering most of the time yet do have strict
> requirements on ordering of I/O accesses (I think this is the common case).
> 
> All feedback/suggestions/war stories welcome!
> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-09-17 11:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-09 11:44 [RFC] Kernel semantics of relaxed MMIO accessors Will Deacon
2013-09-17 11:32 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).