From: santosh.shilimkar@ti.com (Santosh Shilimkar)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 3/4] ARM: bL_entry: Match memory barriers to architectural requirements
Date: Wed, 16 Jan 2013 17:41:00 +0530 [thread overview]
Message-ID: <50F698D4.2050702@ti.com> (raw)
In-Reply-To: <20130116114912.GB1963@linaro.org>
On Wednesday 16 January 2013 05:19 PM, Dave Martin wrote:
> On Wed, Jan 16, 2013 at 12:20:47PM +0530, Santosh Shilimkar wrote:
>> + Catalin, RMK
>>
>> Dave,
>>
>> On Tuesday 15 January 2013 10:18 PM, Dave Martin wrote:
>>> For architectural correctness even Strongly-Ordered memory accesses
>>> require barriers in order to guarantee that multiple CPUs have a
>>> coherent view of the ordering of memory accesses.
>>>
>>> Virtually everything done by this early code is done via explicit
>>> memory access only, so DSBs are seldom required. Existing barriers
>>> are demoted to DMB, except where a DSB is needed to synchronise
>>> non-memory signalling (i.e., before a SEV). If a particular
>>> platform performs cache maintenance in its power_up_setup function,
>>> it should force it to complete explicitly including a DSB, instead
>>> of relying on the bL_head framework code to do it.
>>>
>>> Some additional DMBs are added to ensure all the memory ordering
>>> properties required by the race avoidance algorithm. DMBs are also
>>> moved out of loops, and for clarity some are moved so that most
>>> directly follow the memory operation which needs to be
>>> synchronised.
>>>
>>> The setting of a CPU's bL_entry_vectors[] entry is also required to
>>> act as a synchronisation point, so a DMB is added after checking
>>> that entry to ensure that other CPUs do not observe gated
>>> operations leaking across the opening of the gate.
>>>
>>> Signed-off-by: Dave Martin <dave.martin@linaro.org>
>>> ---
>>
>> Sorry to pick on this again but I am not able to understand why
>> the strongly ordered access needs barriers. At least from the
>> ARM point of view, a strongly ordered write will be more of blocking
>> write and the further interconnect also is suppose to respect that
>
> This is what I originally assumed (hence the absence of barriers in
> the initial patch).
>
>> rule. SO read writes are like adding barrier after every load store
>
> This assumption turns out to be wrong, unfortunately, although in
> a uniprocessor scenario is makes no difference. A SO memory access
> does block the CPU making the access, but explicitly does not
> block the interconnect.
>
I suspected the interconnect part when you described the barrier
need for SO memory region.
> In a typical boot scenario for example, all secondary CPUs are
> quiescent or powered down, so there's no problem. But we can't make
> the same assumptions when we're trying to coordinate between
> multiple active CPUs.
>
>> so adding explicit barriers doesn't make sense. Is this a side
>> effect of some "write early response" kind of optimizations at
>> interconnect level ?
>
> Strongly-Ordered accesses are always non-shareable, so there is
> no explicit guarantee of coherency between multiple masters.
>
This is where probably issue then. My understanding is exactly
opposite here and hence I wasn't worried about multi-master
CPU scenario since sharable attributes would be taking care of it
considering the same page tables being used in SMP system.
ARM documentation says -
------------
Shareability and the S bit, with TEX remap
The memory type of a region, as indicated in the Memory type column of
Table B3-12 on page B3-1350, provides
the first level of control of whether the region is shareable:
? If the memory type is Strongly-ordered then the region is Shareable
------------------------------------------------------------
> If there is only one master, it makes no difference, but if there
> are multiple masters, there is no guarantee that they are conntected
> to a slave device (DRAM controller in this case) via a single
> slave port.
>
See above. We are talking about multiple CPUs here and not
the DSP or other co-processors. In either case we are discussing
code which is getting execute on ARM CPUs so we can safely limit
it to the multi-master ARM CPU.
> The architecture only guarantees global serialisation when there is a
> single slave device, but provides no way to know whether two accesses
> from different masters will reach the same slave port. This is in the
> realms of "implementation defined."
>
> Unfortunately, a high-performance component like a DRAM controller
> is exactly the kind of component which may implement multiple
> master ports, so you can't guarantee that accesses are serialised
> in the same order from the perspective of all masters. There may
> be some pipelining and caching between each master port and the actual
> memory, for example. This is allowed, because there is no requirement
> for the DMC to look like a single slave device from the perspective
> of multiple masters.
>
> A multi-ported slave might provide transparent coherency between master
> ports, but it is only required to guarantee this when the accesses
> are shareable (SO is always non-shared), or when explicit barriers
> are used to force synchronisation between the device's master ports.
>
> Of course, a given platform may have a DMC with only one slave
> port, in which case the barriers should not be needed. But I wanted
> this code to be generic enough to be reusable -- hence the
> addition of the barriers. The CPU does not need to wait for a DMB
> to "complete" in any sense, so this does not necessarily have a
> meaningful impact on performance.
>
> This is my understanding anyway.
>
>> Will you be able to point to specs or documents which puts
>> this requirement ?
>
> Unfortunately, this is one of this things which we require not because
> there is a statement in the ARM ARM to say that we need it -- rather,
> there is no statement in the ARM ARM to say that we don't.
>
Thanks a lot for elaborate answer. It helps to understand the rationale
at least.
Regards
Santosh
next prev parent reply other threads:[~2013-01-16 12:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-15 16:48 [RFC PATCH 0/4] b.L: Memory barriers and miscellaneous tidyups Dave Martin
2013-01-15 16:48 ` [RFC PATCH 1/4] ARM: b.L: Remove C declarations for vlocks Dave Martin
2013-01-15 16:48 ` [RFC PATCH 2/4] ARM: b.L: vlocks: Add architecturally required memory barriers Dave Martin
2013-01-15 16:48 ` [RFC PATCH 3/4] ARM: bL_entry: Match memory barriers to architectural requirements Dave Martin
2013-01-16 6:50 ` Santosh Shilimkar
2013-01-16 11:49 ` Dave Martin
2013-01-16 12:11 ` Santosh Shilimkar [this message]
2013-01-16 12:47 ` Dave Martin
2013-01-16 14:36 ` Santosh Shilimkar
2013-01-16 15:05 ` Catalin Marinas
2013-01-16 15:37 ` Dave Martin
2013-01-17 6:39 ` Santosh Shilimkar
2013-01-15 16:48 ` [RFC PATCH 4/4] ARM: vexpress/dcscb: power_up_setup memory barrier cleanup Dave Martin
2013-01-15 17:29 ` [RFC PATCH 0/4] b.L: Memory barriers and miscellaneous tidyups Nicolas Pitre
2013-01-15 17:42 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50F698D4.2050702@ti.com \
--to=santosh.shilimkar@ti.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).