* On __raw_readl readl_relaxed and readl nocheinmal
@ 2011-05-20 10:04 Linus Walleij
2011-05-20 10:42 ` Russell King - ARM Linux
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Linus Walleij @ 2011-05-20 10:04 UTC (permalink / raw)
To: linux-arm-kernel
Now I get thes questions about I/O accessors from all over the
place where I'm working. It seems some public clarification is needed...
My current understanding:
__raw_writel(a, reg1);
__raw_writel(b, reg2);
This does not guarantee that the write of b into reg2 is even done
after writing a into reg1 due to instruction reeordering.
writel_relaxed(a, reg1);
writel_relaxed(b, reg2);
This will insert a barrier() so we know that the CPU will execute the
write of a before the write of b. However it does not mandate that
reg2 is written with b before reg1 is written with a at the hardware
register level.
writel(a, reg1);
writel(b, reg2);
This actually pushes the value all the way through so that you know
that the values has landed in the hardware after each statement.
What we would like to know is the effect of things like this:
__raw_writel(a, reg1);
__raw_writel(b, reg1);
__raw_writel(c, reg1);
writel_relaxed(a, reg1);
writel_relaxed(b, reg1);
writel_relaxed(c, reg1);
My *guess* is that in the first case the pipeline may even remove
the write if a and b to reg1 since it's only caring about the end
result (insert the volatile story in
Documentation/volatile_considered_harmful.txt here)
The second case (writel_relaxed() to the same register) would
make sure that the writes actually happens in sequence,
but after the last statement it may take a while before the
actual hardware write happens.
And what about this:
writel_relaxed(a, reg1);
writel_relaxed(b, reg1);
writel(c, reg1);
I *think* this means that the writes will be done in sequence,
and after the last statement you know all writes have commenced.
So beat me up now.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:04 On __raw_readl readl_relaxed and readl nocheinmal Linus Walleij
@ 2011-05-20 10:42 ` Russell King - ARM Linux
2011-05-20 10:50 ` Arnd Bergmann
2011-05-20 10:55 ` Catalin Marinas
2 siblings, 0 replies; 16+ messages in thread
From: Russell King - ARM Linux @ 2011-05-20 10:42 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 20, 2011 at 12:04:48PM +0200, Linus Walleij wrote:
> My current understanding:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg2);
>
> This does not guarantee that the write of b into reg2 is even done
> after writing a into reg1 due to instruction reeordering.
No. It's not about instruction re-ordering. The instruction stream will
show a write to reg1 followed by a write to reg2.
The writes are performed using the CPUs current endianness. The CPU and
buses are free to re-order this if they so wish, and the CPU is free to
re-order them with respect to other types of access (iow, memory and
strongly ordered.)
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg2);
>
> This will insert a barrier() so we know that the CPU will execute the
> write of a before the write of b. However it does not mandate that
> reg2 is written with b before reg1 is written with a at the hardware
> register level.
These writes are performed using little endian byte order. The CPU and
buses are free to re-order this if they so wish, and the CPU is free to
re-order them with respect to other types of access.
> writel(a, reg1);
> writel(b, reg2);
>
> This actually pushes the value all the way through so that you know
> that the values has landed in the hardware after each statement.
These writes are again performed using little endian byte order. We
insert barriers to ensure that the CPU does not re-order these with
respect to other accesses, including memory accesses. However, downstream
bus hardware may still delay the writes, which may result in the write to
reg2 arriving before reg1, especially if they're on different buses.
> What we would like to know is the effect of things like this:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg1);
> __raw_writel(c, reg1);
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel_relaxed(c, reg1);
>
> My *guess* is that in the first case the pipeline may even remove
> the write if a and b to reg1 since it's only caring about the end
> result (insert the volatile story in
> Documentation/volatile_considered_harmful.txt here)
No. The compiler can't optimize the volatile accesses like that. The
volatile document is written from the point of making driver writers
use the accessor functions.
In both cases, because the write is to the same register, and as I've
said above, they won't be re-ordered in the instruction stream, plus
if reg1 is _device_ memory, they will appear in-order on the destination
bus.
> make sure that the writes actually happens in sequence,
> but after the last statement it may take a while before the
> actual hardware write happens.
>
> And what about this:
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel(c, reg1);
The same. But a and b may arrive at the device before previous writes to
memory have completed, whereas c will only arrive after all of a, b and
previous memory writes have completed.
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:04 On __raw_readl readl_relaxed and readl nocheinmal Linus Walleij
2011-05-20 10:42 ` Russell King - ARM Linux
@ 2011-05-20 10:50 ` Arnd Bergmann
2011-05-20 10:55 ` Catalin Marinas
2 siblings, 0 replies; 16+ messages in thread
From: Arnd Bergmann @ 2011-05-20 10:50 UTC (permalink / raw)
To: linux-arm-kernel
On Friday 20 May 2011 12:04:48 Linus Walleij wrote:
> Now I get thes questions about I/O accessors from all over the
> place where I'm working. It seems some public clarification is needed...
>
> My current understanding:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg2);
>
> This does not guarantee that the write of b into reg2 is even done
> after writing a into reg1 due to instruction reeordering.
There are a lot of things that this does not guarantee, but it does
guarantee the order in which the instructions are executed on the
CPU (because of the volatile). For example, it does not guarantee
that the write is atomic, ordering with regard to spinlocks, or the
endianess of the access.
What exactly goes on there is highly architecture specific, so
even if something specific happens on ARM, the result may be something
completely different on another architecture.
Just don't use them.
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg2);
>
> This will insert a barrier() so we know that the CPU will execute the
> write of a before the write of b. However it does not mandate that
> reg2 is written with b before reg1 is written with a at the hardware
> register level.
I don't see a barrier in the definition of writel_relaxed, but
the instructions on the CPU are in order because of the volatile
access. On all sane hardware, this means that they also arrive
in the registers in order, as long as they are for the same
device. However, writing into a PCI device (reg1) first and then
writing into a local SoC register (reg2) will typically mean
they arrive in a different order.
> writel(a, reg1);
> writel(b, reg2);
>
> This actually pushes the value all the way through so that you know
> that the values has landed in the hardware after each statement.
You know that the first write has made it to the bus before the second
one, because each writel flushes the previous write accesses.
Whether that means that it has made it to the device depends on
what bus reg1 is on.
The write to reg2 may still be in flight for an indefinite amount
of time after the second writel, unless you do a bus synchronizing
instruction, e.g. a readl(reg2).
> What we would like to know is the effect of things like this:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg1);
> __raw_writel(c, reg1);
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel_relaxed(c, reg1);
>
> My *guess* is that in the first case the pipeline may even remove
> the write if a and b to reg1 since it's only caring about the end
> result (insert the volatile story in
> Documentation/volatile_considered_harmful.txt here)
>
> The second case (writel_relaxed() to the same register) would
> make sure that the writes actually happens in sequence,
> but after the last statement it may take a while before the
> actual hardware write happens.
You are probably confusing this with the page flags. If the memory
is in write-combining mode, all six writes may be combined into
a single bus access, if it's not write-combining, they get sent
to the bus as separate operations, and then it depends on the bus.
The writel_relaxed will also do a byte swap on big-endian architectures,
which the __raw_writel never does.
In theory, writel_relaxed should ensure that the access is done
atomically as a 32 bit write and cannot be replaced with four byte
accesses, but currently we allow the compiler to do either. Most
of the time, it chooses to do 32 bit writes for both __raw_writel
and for writel_relaxed.
> And what about this:
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel(c, reg1);
>
> I *think* this means that the writes will be done in sequence,
> and after the last statement you know all writes have commenced.
No, even writel() is still posted, it may not have arrived at
the device when you do the next instruction. The difference between
writel_relaxed and writel is that prior accesses to other data have
completed before the writel becomes visible on the bus.
What you need to do in order to be sure that the writel has made
it to the device depends on the specific bus. Originally, writel
was intended for PCI buses and similar things where you have to
readl from the same device in order to actually flush it all the
way down.
This is the main difference to PIO accessors (outl) that operate
on a special non-posted memory range that is only visible to
a few bus types like PCI or PCMCIA.
Arnd
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:04 On __raw_readl readl_relaxed and readl nocheinmal Linus Walleij
2011-05-20 10:42 ` Russell King - ARM Linux
2011-05-20 10:50 ` Arnd Bergmann
@ 2011-05-20 10:55 ` Catalin Marinas
2011-05-27 12:07 ` Jamie Lokier
` (2 more replies)
2 siblings, 3 replies; 16+ messages in thread
From: Catalin Marinas @ 2011-05-20 10:55 UTC (permalink / raw)
To: linux-arm-kernel
Linus,
On 20 May 2011 11:04, Linus Walleij <linus.walleij@linaro.org> wrote:
> Now I get thes questions about I/O accessors from all over the
> place where I'm working. It seems some public clarification is needed...
There's not a simple answer as other architectures do something else.
For ARM I can answer below.
> My current understanding:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg2);
>
> This does not guarantee that the write of b into reg2 is even done
> after writing a into reg1 due to instruction reeordering.
The compile should not reorder them as they are volatile accesses.
On ARM it is guaranteed that the writes are issued in program order
and they arrive to the same device in program order (the definition of
the device is still debatable). Arriving at different devices in
program order is not guaranteed (depends on the bus configuration).
They also don't do any endianess conversions.
On other architectures like PPC I think they are completely out of order.
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg2);
>
> This will insert a barrier() so we know that the CPU will execute the
> write of a before the write of b. However it does not mandate that
> reg2 is written with b before reg1 is written with a at the hardware
> register level.
These are as the __raw_* variants, only that they do endianess
conversion (to/from little endian).
> writel(a, reg1);
> writel(b, reg2);
>
> This actually pushes the value all the way through so that you know
> that the values has landed in the hardware after each statement.
writel_relaxed() with barrier before (DSB + outer cache sync). This is
to ensure ordering with accesses to coherent DMA buffers (Normal
Non-cacheable memory). For accesses to the same device we don't
actually need any barriers on ARM as this is guaranteed by the
architecture.
> What we would like to know is the effect of things like this:
>
> __raw_writel(a, reg1);
> __raw_writel(b, reg1);
> __raw_writel(c, reg1);
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel_relaxed(c, reg1);
>
> My *guess* is that in the first case the pipeline may even remove
> the write if a and b to reg1 since it's only caring about the end
> result (insert the volatile story in
> Documentation/volatile_considered_harmful.txt here)
I think it should be ok with a well-behaved compiler.
But I would like is to convert the __raw_* accessors to inline asm but
Russell objected in the past. There are other reasons for this - the
current volatile access can generate post-indexed accesses which are
more expensive to emulate by a hypervisor (on Cortex-A15).
> The second case (writel_relaxed() to the same register) would
> make sure that the writes actually happens in sequence,
> but after the last statement it may take a while before the
> actual hardware write happens.
No difference from the __raw_* ones.
With all of them, even writel(), you don't know when the actual
hardware write happens. You can issue a DSB after to ensure write
completion (I think defined as the device begins to change its state
as a result of the write) but you can't tell whether the device
finished changing its state. For that you would need other means like
reading back from the device.
> And what about this:
>
> writel_relaxed(a, reg1);
> writel_relaxed(b, reg1);
> writel(c, reg1);
>
> I *think* this means that the writes will be done in sequence,
> and after the last statement you know all writes have commenced.
Here, because we use a DSB at the beginning of writel(), you now that
the previous writel_relaxed() calls completed but not the final
writel(). And anyway, the DSB in that was meant for DMA buffer
accesses and not previous I/O accesses.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:55 ` Catalin Marinas
@ 2011-05-27 12:07 ` Jamie Lokier
2011-05-27 14:37 ` Catalin Marinas
2011-05-27 14:14 ` Joakim BECH
2011-05-31 7:12 ` viresh kumar
2 siblings, 1 reply; 16+ messages in thread
From: Jamie Lokier @ 2011-05-27 12:07 UTC (permalink / raw)
To: linux-arm-kernel
Catalin Marinas wrote:
> With all of them, even writel(), you don't know when the actual
> hardware write happens. You can issue a DSB after to ensure write
> completion (I think defined as the device begins to change its state
> as a result of the write) but you can't tell whether the device
> finished changing its state. For that you would need other means like
> reading back from the device.
With PCI on x86, writel() may not reach the hardware immediately (PCI
queues the writes), so I believe you need readl() to some register on
the same device to ensure the write has been acknowledged by the
device.
Does DSB on ARM replace the need for that readl()?
-- Jamie
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:55 ` Catalin Marinas
2011-05-27 12:07 ` Jamie Lokier
@ 2011-05-27 14:14 ` Joakim BECH
2011-05-27 15:02 ` Catalin Marinas
2011-05-27 18:04 ` Russell King - ARM Linux
2011-05-31 7:12 ` viresh kumar
2 siblings, 2 replies; 16+ messages in thread
From: Joakim BECH @ 2011-05-27 14:14 UTC (permalink / raw)
To: linux-arm-kernel
Hi, I'm one of the reasons Linus wrote this question.
I get a bit confused about the feedback in this issue. Listening to Russel
and Arnd I get the impression to not even think about using __raw_writel.
But from Catalin's answer I got the impression it's indeed possible to use
__raw_writel on an ARM architecture and I'm writing to such a device. Could
you please clarify this?
In my particular setup I'm writing a driver that is talking to a crypto
hardware and the architecture is ARM. The hardware is mapped as io-mapped
memory in our driver. Our main problem is to program the hardware, by
writing to certain registers in order and when the hardware is ready to
accept input, we will write the bulk-data to the same register over and over
again until there is no more data. I.e. in principle we do like this (a bit
simplified).
write(a, reg1); // Setup hardware
write(a, reg2); // Setup hardware
write(a, reg3); // Write bulk data
write(a, reg3); // Write bulk data ...
Some of you wrote that the only way to be sure that the data has been
written is to read the value after writing it. Here we have another problem.
Since it's a cryptographic device some registers on the hardware are
write-only, and some registers are actually implemented as a stack in the
hardware itself. If you write a value to a register it will be pushed onto a
stack in the hardware and if you read the same register you pop the value
from the stack in the hardware.
Do you understand my problems and do you have any suggestions for me how to
handle it? The initial problem, looking at using __raw_writel, was actually
to improve performance. I noticed about 50% better throughput when I was
using __raw_writel/readl instead writel/readl, but Linus warned me about the
problems he mentioned in the initial message in this thread.
// Joakim B
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 12:07 ` Jamie Lokier
@ 2011-05-27 14:37 ` Catalin Marinas
0 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2011-05-27 14:37 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 01:07:26PM +0100, Jamie Lokier wrote:
> Catalin Marinas wrote:
> > With all of them, even writel(), you don't know when the actual
> > hardware write happens. You can issue a DSB after to ensure write
> > completion (I think defined as the device begins to change its state
> > as a result of the write) but you can't tell whether the device
> > finished changing its state. For that you would need other means like
> > reading back from the device.
>
> With PCI on x86, writel() may not reach the hardware immediately (PCI
> queues the writes), so I believe you need readl() to some register on
> the same device to ensure the write has been acknowledged by the
> device.
>
> Does DSB on ARM replace the need for that readl()?
The short answer - no.
The long answer, it depends on the bus configuration. For AMBA (and
probably other configurations without complex bridges), DSB guarantees
that the write arrived at the device but does not guarantee that the
device state has been changed. This is enough if you only care about the
ordering in which the writes arrive at two different devices. However,
it is not enough if you rely on some device state being changed before
you do something else which assumes a certain state of that device. A
read back may work but this is device-dependent (though I think commonly
accepted).
I don't think the PCI configurations are different on ARM than x86 as
same chips are used. On PCI you may get a bridge acknowledging the
receiving of a write (which the DSB waits for) even though the device
hasn't got it yet. But as above, if you care about the device state
actually being changed, you need a read back.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 14:14 ` Joakim BECH
@ 2011-05-27 15:02 ` Catalin Marinas
2011-05-27 16:16 ` Arnd Bergmann
2011-05-27 18:06 ` Russell King - ARM Linux
2011-05-27 18:04 ` Russell King - ARM Linux
1 sibling, 2 replies; 16+ messages in thread
From: Catalin Marinas @ 2011-05-27 15:02 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 03:14:18PM +0100, Joakim BECH wrote:
> I get a bit confused about the feedback in this issue. Listening to Russel
> and Arnd I get the impression to not even think about using __raw_writel.
> But from Catalin's answer I got the impression it's indeed possible to use
> __raw_writel on an ARM architecture and I'm writing to such a device. Could
> you please clarify this?
I would advise for using either writel() or writel_relaxed(), depending
on your needs but not using the __raw_writel() directly. (even though on
little endian configurations it is the same as the relaxed variant). The
latter is the raw implementation used by the other accessors.
I think Arnd's point was that the writel_relaxed() variant may be even
more relaxed on other architectures than it is on ARM. I can't talk
about them.
> In my particular setup I'm writing a driver that is talking to a crypto
> hardware and the architecture is ARM. The hardware is mapped as io-mapped
> memory in our driver. Our main problem is to program the hardware, by
> writing to certain registers in order and when the hardware is ready to
> accept input, we will write the bulk-data to the same register over and over
> again until there is no more data. I.e. in principle we do like this (a bit
> simplified).
>
> write(a, reg1); // Setup hardware
> write(a, reg2); // Setup hardware
> write(a, reg3); // Write bulk data
> write(a, reg3); // Write bulk data ...
>
> Some of you wrote that the only way to be sure that the data has been
> written is to read the value after writing it.
The question is - why do you need to make sure that the data has been
written? To me it looks like you only care about register access
ordering.
Writing the device registers will always be done in program order, so
your data in reg3 will be written after you set up the hardware via reg1
and reg2. The reading back isn't needed and you could even use the
relaxed accessors.
There are exceptions depending on your device. If the specs explicitly
ask for certain delay between writing reg1/reg2 and pushing the data in
reg3, than you have need take this into account (using either udelay or
some read back from the device).
> Here we have another problem.
> Since it's a cryptographic device some registers on the hardware are
> write-only, and some registers are actually implemented as a stack in the
> hardware itself. If you write a value to a register it will be pushed onto a
> stack in the hardware and if you read the same register you pop the value
> from the stack in the hardware.
As I said above, you don't need to do this. The read back is usually
needed for things like clearing an error state at the device and then
enabling the IRQ for that device at the GIC level (though this
particular case isn't that simple).
Another example is cancelling some DMA transfer before you start
modifying the data in RAM. Here you need to makes sure that the device
actually stopped after the write to a register and a common scenario is
to read back from the device.
But *note* that from a bus (PCI etc.) perspective it does not need to be
the same register on that device. From a device perspective (like
required delays etc.), it should be specified in the specs.
> Do you understand my problems and do you have any suggestions for me how to
> handle it? The initial problem, looking at using __raw_writel, was actually
> to improve performance. I noticed about 50% better throughput when I was
> using __raw_writel/readl instead writel/readl, but Linus warned me about the
> problems he mentioned in the initial message in this thread.
I would say just use the writel_relaxed/readl_relaxed variants unless
you have DMA transfers (but it looks more like PIO from your example).
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 15:02 ` Catalin Marinas
@ 2011-05-27 16:16 ` Arnd Bergmann
2011-05-27 16:38 ` Catalin Marinas
2011-05-27 18:06 ` Russell King - ARM Linux
1 sibling, 1 reply; 16+ messages in thread
From: Arnd Bergmann @ 2011-05-27 16:16 UTC (permalink / raw)
To: linux-arm-kernel
On Friday 27 May 2011, Catalin Marinas wrote:
> I think Arnd's point was that the writel_relaxed() variant may be even
> more relaxed on other architectures than it is on ARM. I can't talk
> about them.
Right. In general, you should write device drivers as portable as
possible, so don't rely on architecture specific behavior.
> > In my particular setup I'm writing a driver that is talking to a crypto
> > hardware and the architecture is ARM. The hardware is mapped as io-mapped
> > memory in our driver. Our main problem is to program the hardware, by
> > writing to certain registers in order and when the hardware is ready to
> > accept input, we will write the bulk-data to the same register over and over
> > again until there is no more data. I.e. in principle we do like this (a bit
> > simplified).
> >
> > write(a, reg1); // Setup hardware
> > write(a, reg2); // Setup hardware
> > write(a, reg3); // Write bulk data
> > write(a, reg3); // Write bulk data ...
> >
> > Some of you wrote that the only way to be sure that the data has been
> > written is to read the value after writing it.
>
> The question is - why do you need to make sure that the data has been
> written? To me it looks like you only care about register access
> ordering.
>
> Writing the device registers will always be done in program order, so
> your data in reg3 will be written after you set up the hardware via reg1
> and reg2. The reading back isn't needed and you could even use the
> relaxed accessors.
... as long as it is the same device. But that seems to be the case
here. Only if the registers are in two separate areas of MMIO space,
e.g the same device attached to two buses, ordering may not
be what you expect by the time the data arrives at the device.
> > Here we have another problem.
> > Since it's a cryptographic device some registers on the hardware are
> > write-only, and some registers are actually implemented as a stack in the
> > hardware itself. If you write a value to a register it will be pushed onto a
> > stack in the hardware and if you read the same register you pop the value
> > from the stack in the hardware.
>
> As I said above, you don't need to do this. The read back is usually
> needed for things like clearing an error state at the device and then
> enabling the IRQ for that device at the GIC level (though this
> particular case isn't that simple).
To give yet another example: consider a device that has a register to
enable level triggered interrupts. You may want to disable the
interrupt by writing to the register during the interrupt handler.
When you return from the interrupt handler, the CPU's IRQ mask
is opened while the write may be still in flight. The only way
to avoid that is to read back the interrupt mask from the device
to flush out the write and make sure it's actually disabled before
you reopen the irqs at the CPU.
> > Do you understand my problems and do you have any suggestions for me how to
> > handle it? The initial problem, looking at using __raw_writel, was actually
> > to improve performance. I noticed about 50% better throughput when I was
> > using __raw_writel/readl instead writel/readl, but Linus warned me about the
> > problems he mentioned in the initial message in this thread.
>
> I would say just use the writel_relaxed/readl_relaxed variants unless
> you have DMA transfers (but it looks more like PIO from your example).
When you use the *_relaxed variants, you also need to add explicit
barriers (wmb() and rmb()) to make synchronize with spinlocks. Otherwise,
an out-of-order CPU might move the MMIO access outside of the critical
section.
Arnd
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 16:16 ` Arnd Bergmann
@ 2011-05-27 16:38 ` Catalin Marinas
2011-05-27 17:10 ` Arnd Bergmann
0 siblings, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2011-05-27 16:38 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 05:16:09PM +0100, Arnd Bergmann wrote:
> On Friday 27 May 2011, Catalin Marinas wrote:
> > > Here we have another problem. Since it's a cryptographic device
> > > some registers on the hardware are write-only, and some registers
> > > are actually implemented as a stack in the hardware itself. If you
> > > write a value to a register it will be pushed onto a stack in the
> > > hardware and if you read the same register you pop the value from
> > > the stack in the hardware.
> >
> > As I said above, you don't need to do this. The read back is usually
> > needed for things like clearing an error state at the device and
> > then enabling the IRQ for that device at the GIC level (though this
> > particular case isn't that simple).
>
> To give yet another example: consider a device that has a register to
> enable level triggered interrupts. You may want to disable the
> interrupt by writing to the register during the interrupt handler.
> When you return from the interrupt handler, the CPU's IRQ mask is
> opened while the write may be still in flight. The only way to avoid
> that is to read back the interrupt mask from the device to flush out
> the write and make sure it's actually disabled before you reopen the
> irqs at the CPU.
As I said above, that's not a simple case and you could still get a
spurious interrupt. Based on feedback from the hw people, even if you
lower the interrupt level at the device (by writing and reading back
from the device), there is a delay in the signal propagation and
enabling the interrupts at the CPU level (or interrupt controller level)
could still trigger the interrupt. You can add extra delay to reduce the
chances but that's SoC specific. Anyway, reading back from the device
makes the likelihood much smaller.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 16:38 ` Catalin Marinas
@ 2011-05-27 17:10 ` Arnd Bergmann
0 siblings, 0 replies; 16+ messages in thread
From: Arnd Bergmann @ 2011-05-27 17:10 UTC (permalink / raw)
To: linux-arm-kernel
On Friday 27 May 2011 18:38:29 Catalin Marinas wrote:
> As I said above, that's not a simple case and you could still get a
> spurious interrupt. Based on feedback from the hw people, even if you
> lower the interrupt level at the device (by writing and reading back
> from the device), there is a delay in the signal propagation and
> enabling the interrupts at the CPU level (or interrupt controller level)
> could still trigger the interrupt. You can add extra delay to reduce the
> chances but that's SoC specific. Anyway, reading back from the device
> makes the likelihood much smaller.
Ah, right. I was thinking of message signaled interrupts, but they
are probably not so common on ARM. With MSI, the interrupt message
would also get flushed by the readl().
Arnd
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 14:14 ` Joakim BECH
2011-05-27 15:02 ` Catalin Marinas
@ 2011-05-27 18:04 ` Russell King - ARM Linux
1 sibling, 0 replies; 16+ messages in thread
From: Russell King - ARM Linux @ 2011-05-27 18:04 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 04:14:18PM +0200, Joakim BECH wrote:
> write(a, reg1); // Setup hardware
> write(a, reg2); // Setup hardware
> write(a, reg3); // Write bulk data
> write(a, reg3); // Write bulk data ...
>
> Some of you wrote that the only way to be sure that the data has been
> written is to read the value after writing it. Here we have another problem.
> Since it's a cryptographic device some registers on the hardware are
> write-only, and some registers are actually implemented as a stack in the
> hardware itself. If you write a value to a register it will be pushed onto a
> stack in the hardware and if you read the same register you pop the value
> from the stack in the hardware.
Whichever interface you use on ARM, you will get the writes occuring in
order provided the registers are local to each other. What you can't
guarantee is the relative ordering of those writes with respect to other
memory accesses, nor when the writes will actually hit the hardware.
(There is hardware which has weird partitioning and allows writes to the
same _device_ to bypass each other and I personally consider this insane.)
> Do you understand my problems and do you have any suggestions for me how to
> handle it? The initial problem, looking at using __raw_writel, was actually
> to improve performance. I noticed about 50% better throughput when I was
> using __raw_writel/readl instead writel/readl, but Linus warned me about the
> problems he mentioned in the initial message in this thread.
If you're writing a stream of data to a register, rather than coding a
for() loop and writel/readl, use readsl or writesl. These pre-calculate
the cookie->address conversion, and then just get on with writing the
stream to the register. They don't intersperse a barrier either (and
we don't currently add any barrier to them as we don't expect there to
be any ordering issues with memory accesses.)
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 15:02 ` Catalin Marinas
2011-05-27 16:16 ` Arnd Bergmann
@ 2011-05-27 18:06 ` Russell King - ARM Linux
2011-05-29 9:24 ` Catalin Marinas
1 sibling, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2011-05-27 18:06 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 04:02:42PM +0100, Catalin Marinas wrote:
> I think Arnd's point was that the writel_relaxed() variant may be even
> more relaxed on other architectures than it is on ARM. I can't talk
> about them.
Actually, writel_relaxed() doesn't exist outside ARM, it's not part of
the IO specification. That's why it shouldn't be used in generic drivers
until it's become part of the IO specification _first_.
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-27 18:06 ` Russell King - ARM Linux
@ 2011-05-29 9:24 ` Catalin Marinas
0 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2011-05-29 9:24 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, May 27, 2011 at 07:06:25PM +0100, Russell King - ARM Linux wrote:
> On Fri, May 27, 2011 at 04:02:42PM +0100, Catalin Marinas wrote:
> > I think Arnd's point was that the writel_relaxed() variant may be even
> > more relaxed on other architectures than it is on ARM. I can't talk
> > about them.
>
> Actually, writel_relaxed() doesn't exist outside ARM, it's not part of
> the IO specification. That's why it shouldn't be used in generic drivers
> until it's become part of the IO specification _first_.
The read*_relaxed is provided by many other architectures. The
write*_relaxed seems to be provided by arm and sh only. But it is not
part of the IO specification. It should become part as this was the
suggested improvement during the barrier threads we had on linux-arch in
the past. I can write a few simple patches and post them on linux-arch
for review. I think the tricky part is getting an agreement with the
other architectures on how relaxed they can be.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-20 10:55 ` Catalin Marinas
2011-05-27 12:07 ` Jamie Lokier
2011-05-27 14:14 ` Joakim BECH
@ 2011-05-31 7:12 ` viresh kumar
2011-05-31 8:53 ` Catalin Marinas
2 siblings, 1 reply; 16+ messages in thread
From: viresh kumar @ 2011-05-31 7:12 UTC (permalink / raw)
To: linux-arm-kernel
Catalin,
On 05/20/2011 04:25 PM, Catalin Marinas wrote:
> On ARM it is guaranteed that the writes are issued in program order
> and they arrive to the same device in program order (the definition of
> the device is still debatable). Arriving at different devices in
> program order is not guaranteed (depends on the bus configuration).
> They also don't do any endianess conversions.
Is this guaranteed in case of ARMV7 also, where we have two ports coming
out of ARM ? As now there can be two access paths to devices, and so they
might come out of order.
--
viresh
^ permalink raw reply [flat|nested] 16+ messages in thread
* On __raw_readl readl_relaxed and readl nocheinmal
2011-05-31 7:12 ` viresh kumar
@ 2011-05-31 8:53 ` Catalin Marinas
0 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2011-05-31 8:53 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 31, 2011 at 08:12:38AM +0100, viresh kumar wrote:
> On 05/20/2011 04:25 PM, Catalin Marinas wrote:
> > On ARM it is guaranteed that the writes are issued in program order
> > and they arrive to the same device in program order (the definition of
> > the device is still debatable). Arriving at different devices in
> > program order is not guaranteed (depends on the bus configuration).
> > They also don't do any endianess conversions.
>
> Is this guaranteed in case of ARMV7 also, where we have two ports coming
> out of ARM ? As now there can be two access paths to devices, and so they
> might come out of order.
The order to different devices is not guaranteed. The order to the same
device is guaranteed but the issue here is the device definition. For
device address range greater than 1KB (the minimum specified in the ARM
ARM), a single device can be connected to both ports and accessing
registers which are not in the same 1KB block may arrive out of order.
While I don't think this should be done in practice (or at least the 1KB
range should be larger), it is already present in certain
implementations.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-05-31 8:53 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-20 10:04 On __raw_readl readl_relaxed and readl nocheinmal Linus Walleij
2011-05-20 10:42 ` Russell King - ARM Linux
2011-05-20 10:50 ` Arnd Bergmann
2011-05-20 10:55 ` Catalin Marinas
2011-05-27 12:07 ` Jamie Lokier
2011-05-27 14:37 ` Catalin Marinas
2011-05-27 14:14 ` Joakim BECH
2011-05-27 15:02 ` Catalin Marinas
2011-05-27 16:16 ` Arnd Bergmann
2011-05-27 16:38 ` Catalin Marinas
2011-05-27 17:10 ` Arnd Bergmann
2011-05-27 18:06 ` Russell King - ARM Linux
2011-05-29 9:24 ` Catalin Marinas
2011-05-27 18:04 ` Russell King - ARM Linux
2011-05-31 7:12 ` viresh kumar
2011-05-31 8:53 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).