* Re: I/O read, write implementation questions
2006-04-25 14:13 I/O read, write implementation questions Zoltan Menyhart
@ 2006-04-25 14:46 ` David Mosberger-Tang
2006-04-25 15:20 ` Zoltan Menyhart
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger-Tang @ 2006-04-25 14:46 UTC (permalink / raw)
To: linux-ia64
Zoltan,
On 4/25/06, Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
> The SDM shows on pages 2:585-586 how the I/O space reads and
> writes have to be iplemented, e.g.:
>
> outb: ...
> mf
> st1.rel [port_addr] = in1
> mf.a
> mf
>
> inb: ...
> mf
> ld1.acq r8 = [port_addr]
> mf.a
> mf
>
> The actual implementation does not include the pairs of "mf"-s.
> Can someone, please, explain me why they are left off?
Linux itself supports weak memory ordering. You want to minimize the
amount of fences in low-level primitives because they're expensive and
you only want to pay the prize of them when they're really needed.
> The following code sequence:
>
> outb(data, port_addr);
> flag = 1;
>
> may be compiled as:
>
> add r8 = 1, r0
> add r2 = flag_offs, r1
> ;;
> st1.rel [port_addr] = data
> mf.a
> st1 [r2] = r8
>
> What prevents "st1 [r2] = r8" from being seen before
> "st1.rel [port_addr] = data" is seen?
Nothing. Why *should* an unordered store be ordered with respect to
outb()? If you want ordering, either declare "flag" volatile or add
an explicit barrier.
> Why do not "readb()" ... "writeb()" include "mf.a"-s?
Again, acceptance is not normally needed by readX/writeX and mf.a is
extremely expensive (on the order of 1,000 cycles). If you want
ordering, you need to use explicit barriers (or rely on the effect of
"volatile" in ia64-specific code).
--david
--
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: I/O read, write implementation questions
2006-04-25 14:13 I/O read, write implementation questions Zoltan Menyhart
2006-04-25 14:46 ` David Mosberger-Tang
@ 2006-04-25 15:20 ` Zoltan Menyhart
2006-04-25 23:20 ` Grant Grundler
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Zoltan Menyhart @ 2006-04-25 15:20 UTC (permalink / raw)
To: linux-ia64
David Mosberger-Tang wrote:
>>Why do not "readb()" ... "writeb()" include "mf.a"-s?
>
> Again, acceptance is not normally needed by readX/writeX and mf.a is
> extremely expensive (on the order of 1,000 cycles). If you want
> ordering, you need to use explicit barriers (or rely on the effect of
> "volatile" in ia64-specific code).
Assuming a device driver uses memory mapped I/O, what is the architecture
independent way to make sure that the I/O reads - writes are accepted ?
(I cannot use "__ia64_mf_a()".)
What is the difference between "readb_relaxed()" and "readb()"?
Were not they defined to provide both strict and weak (relaxed)
I/O ordering?
Thanks,
Zoltan
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: I/O read, write implementation questions
2006-04-25 14:13 I/O read, write implementation questions Zoltan Menyhart
2006-04-25 14:46 ` David Mosberger-Tang
2006-04-25 15:20 ` Zoltan Menyhart
@ 2006-04-25 23:20 ` Grant Grundler
2006-04-26 15:48 ` Brent Casavant
2006-04-26 15:53 ` Brent Casavant
4 siblings, 0 replies; 6+ messages in thread
From: Grant Grundler @ 2006-04-25 23:20 UTC (permalink / raw)
To: linux-ia64
On Tue, Apr 25, 2006 at 05:20:49PM +0200, Zoltan Menyhart wrote:
> David Mosberger-Tang wrote:
>
> >>Why do not "readb()" ... "writeb()" include "mf.a"-s?
> >
> >Again, acceptance is not normally needed by readX/writeX and mf.a is
> >extremely expensive (on the order of 1,000 cycles). If you want
> >ordering, you need to use explicit barriers (or rely on the effect of
> >"volatile" in ia64-specific code).
MMIO reads cost between 1000-3000 cycles anyway depending on the
configuration. "mmio_test" on gnumonks.org/svn/mmio_test (roughly)
will help people measure the exact cost.
> Assuming a device driver uses memory mapped I/O, what is the architecture
> independent way to make sure that the I/O reads - writes are accepted ?
> (I cannot use "__ia64_mf_a()".)
I think it depends on your definition of "accepted".
I tried addressing this question before:
http://iou.parisc-linux.org/porting_zx1/4_4MMIO_Write_Ordering.html
I believe for mmio writes, the section on "Posted MMIO Writes" in
the same paper answers your question.
The CPU will stall for MMIO reads and thus only needs mb() or wmb()
depending on what ordering is required.
> What is the difference between "readb_relaxed()" and "readb()"?
> Were not they defined to provide both strict and weak (relaxed)
> I/O ordering?
This is a hack added for SGI Altix. The "_relaxed()" in this
case refers to PCI ordering rules that could be violated by
SGI HW (and the violation causes no harm). Use google for read_relaxed
(or was it write_relaxed?) discussion between Jesse Barnes
and myself about 2 years ago.
hth,
grant
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: I/O read, write implementation questions
2006-04-25 14:13 I/O read, write implementation questions Zoltan Menyhart
` (2 preceding siblings ...)
2006-04-25 23:20 ` Grant Grundler
@ 2006-04-26 15:48 ` Brent Casavant
2006-04-26 15:53 ` Brent Casavant
4 siblings, 0 replies; 6+ messages in thread
From: Brent Casavant @ 2006-04-26 15:48 UTC (permalink / raw)
To: linux-ia64
On Tue, 25 Apr 2006, Grant Grundler wrote:
> On Tue, Apr 25, 2006 at 05:20:49PM +0200, Zoltan Menyhart wrote:
> > Assuming a device driver uses memory mapped I/O, what is the architecture
> > independent way to make sure that the I/O reads - writes are accepted ?
> > (I cannot use "__ia64_mf_a()".)
>
> I think it depends on your definition of "accepted".
> I tried addressing this question before:
> http://iou.parisc-linux.org/porting_zx1/4_4MMIO_Write_Ordering.html
>
> I believe for mmio writes, the section on "Posted MMIO Writes" in
> the same paper answers your question.
> The CPU will stall for MMIO reads and thus only needs mb() or wmb()
> depending on what ordering is required.
Note that wmb() is not sufficient for writes, at least on SGI Altix.
wmb() ensures that the write has issued from the processor, however
it does not ensure that the I/O device itself has seen the write.
The mmiowb() is the appropriate call to ensure that the write has
been seen.
Interested parties may want to read further for a description of
the Altix IO write ordering issues... everyone else can move along. :)
On Altix, an IO write may be cached by the CPU's local Shub ASIC
until another agent on the NUMAlink network indicates that the write
has been accepted. This cacheing is necessary in order to handle
retries, network congestion, and other such conditions. Shub will
guarantee the ordering of writes from the CPUs locally attached to it
(i.e. a single NUMA node).
Thus, wmb() only ensures that the Shub ASIC has seen the write, not
that the target device has seen the write. If you can be sure that
all writes plus the next read (which will stall in the Shub until all
prior writes are accepted) to that device are all issued from a single
CPU, this can be sufficient to guarantee IO write ordering.
However, if you cannot guarantee that all IO writes to the device and
the next read will be issued from a single CPU (actually a single Shub),
then wmb() is insufficient, as seperate Shubs do not guarantee any
particular IO ordering with respect to one another. In this case, an
mmiowb() call will ensure that the IO write has been accepted by the
target IO controller (typically a PCI bridge ASIC of some flavor).
For IA64 non-SN, mmiowb() is simply a wmb(). For SN, we poll a register
in the Shub ASIC which indicates the number of outstanding IO writes
until it indicates all writes have been accepted. I can't speak to
the mmiowb() implementation outside of IA64, however for simpler
architectures such as PCs, I believe a wmb() is likely sufficient.
Brent
--
Brent Casavant All music is folk music. I ain't
bcasavan@sgi.com never heard a horse sing a song.
Silicon Graphics, Inc. -- Louis Armstrong
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: I/O read, write implementation questions
2006-04-25 14:13 I/O read, write implementation questions Zoltan Menyhart
` (3 preceding siblings ...)
2006-04-26 15:48 ` Brent Casavant
@ 2006-04-26 15:53 ` Brent Casavant
4 siblings, 0 replies; 6+ messages in thread
From: Brent Casavant @ 2006-04-26 15:53 UTC (permalink / raw)
To: linux-ia64
On Wed, 26 Apr 2006, Brent Casavant wrote:
> Note that wmb() is not sufficient for writes, at least on SGI Altix.
> wmb() ensures that the write has issued from the processor, however
> it does not ensure that the I/O device itself has seen the write.
> The mmiowb() is the appropriate call to ensure that the write has
> been seen.
Oh, and lest I confuse anyone, wmb() is sufficient for writes to RAM.
It's only I/O writes for which it is insufficient.
Brent
--
Brent Casavant All music is folk music. I ain't
bcasavan@sgi.com never heard a horse sing a song.
Silicon Graphics, Inc. -- Louis Armstrong
^ permalink raw reply [flat|nested] 6+ messages in thread