* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
@ 2001-12-10 22:57 ` Jack Steiner
2001-12-10 23:00 ` Jesse Barnes
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jack Steiner @ 2001-12-10 22:57 UTC (permalink / raw)
To: linux-ia64
>
> I noticed that in asm-ia64/system.h there's a comment that reads:
> /*
> ...
> * Note: "mb()" and its variants cannot be used as a fence to order
> * accesses to memory mapped I/O registers. For that, mf.a needs to
> * be used. However, we don't want to always use mf.a because (a)
> * it's (presumably) much slower than mf and (b) mf.a is supported for
> * sequential memory pages only.
> */
>
> Is there a macro (e.g. piob() or mmiob()) to wrap mf.a or are users
> expected to call it explicitly when they need it? If there is no
> macro, I'd like to add one, as I think it will be necessary to
> properly support our NUMA platform.
>
There is a macro in io.h
#define __ia64_mf_a() __asm__ __volatile__ ("mf.a" ::: "memory")
However, it is not clear to me when a mf.a is actually required?? Does
anyone have an opinion?? I dont see any clear need for it. Seems like places
where we might want to use it actually require a flushbus().
--
Thanks
Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
2001-12-10 22:57 ` Jack Steiner
@ 2001-12-10 23:00 ` Jesse Barnes
2001-12-10 23:58 ` David Mosberger
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2001-12-10 23:00 UTC (permalink / raw)
To: linux-ia64
Actually, I was thinking that for most platforms, mf.a should do a
'flushbus' (i.e. act as a barrier for all previous pio reads/writes).
Of course, for SN, we'll need to do other stuff too, so I'd like to
make a machine vector.
Jesse
On Mon, Dec 10, 2001 at 04:57:33PM -0600, Jack Steiner wrote:
> There is a macro in io.h
>
> #define __ia64_mf_a() __asm__ __volatile__ ("mf.a" ::: "memory")
>
>
> However, it is not clear to me when a mf.a is actually required?? Does
> anyone have an opinion?? I dont see any clear need for it. Seems like places
> where we might want to use it actually require a flushbus().
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
2001-12-10 22:57 ` Jack Steiner
2001-12-10 23:00 ` Jesse Barnes
@ 2001-12-10 23:58 ` David Mosberger
2001-12-11 0:23 ` Jesse Barnes
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2001-12-10 23:58 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 10 Dec 2001 14:47:40 -0800, Jesse Barnes <jbarnes@sgi.com> said:
Jesse> I noticed that in asm-ia64/system.h there's a comment that reads:
Jesse> /*
Jesse> ...
Jesse> * Note: "mb()" and its variants cannot be used as a fence to order
Jesse> * accesses to memory mapped I/O registers. For that, mf.a needs to
Jesse> * be used. However, we don't want to always use mf.a because (a)
Jesse> * it's (presumably) much slower than mf and (b) mf.a is supported for
Jesse> * sequential memory pages only.
Jesse> */
Jesse> Is there a macro (e.g. piob() or mmiob()) to wrap mf.a or are users
Jesse> expected to call it explicitly when they need it? If there is no
Jesse> macro, I'd like to add one, as I think it will be necessary to
Jesse> properly support our NUMA platform.
The comment is wrong, or at least misleading (I wrote it, so hopefully
nobody is offended... ;-). mf.a is needed for inX/outX emulation, not
really for ordering. Uncached accesses are not re-ordered by the CPU
and mf will do just fine as far as ordering of cached accesses are
concerned.
Platform-acceptance is a tricky business, as it's, well, platform
dependent (note that "mf.a" doesn't really guarantee to do anything).
Can you get away with forcing the proper ordering with a dummy-read?
If so, I suspect that would be preferable as that is the only platform
independent way to do this (as far as I know).
--david
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
` (2 preceding siblings ...)
2001-12-10 23:58 ` David Mosberger
@ 2001-12-11 0:23 ` Jesse Barnes
2001-12-11 0:44 ` David Mosberger
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2001-12-11 0:23 UTC (permalink / raw)
To: linux-ia64
On Mon, Dec 10, 2001 at 03:58:19PM -0800, David Mosberger wrote:
> The comment is wrong, or at least misleading (I wrote it, so hopefully
> nobody is offended... ;-). mf.a is needed for inX/outX emulation, not
> really for ordering. Uncached accesses are not re-ordered by the CPU
> and mf will do just fine as far as ordering of cached accesses are
> concerned.
On our platform, uncached accesses can arrive out of orderif they come
from different cpus, so simple spinlock protection isn't quite enough
to guarantee that pios don't arrive out of order. We'll need a pio
barrier macro and (ideally) an spin_unlock variant that includes it.
> Can you get away with forcing the proper ordering with a dummy-read?
> If so, I suspect that would be preferable as that is the only platform
> independent way to do this (as far as I know).
Do you mean that a driver should do a dummy read following writes that
have to arrive prior to others? I guess that would be just as
invasive as introducing a pio barrier or another spinlock...
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
` (3 preceding siblings ...)
2001-12-11 0:23 ` Jesse Barnes
@ 2001-12-11 0:44 ` David Mosberger
2001-12-11 0:55 ` Jesse Barnes
2001-12-11 5:30 ` David Mosberger
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2001-12-11 0:44 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 10 Dec 2001 16:23:00 -0800, Jesse Barnes <jbarnes@sgi.com> said:
Jesse> On Mon, Dec 10, 2001 at 03:58:19PM -0800, David Mosberger wrote:
>> The comment is wrong, or at least misleading (I wrote it, so hopefully
>> nobody is offended... ;-). mf.a is needed for inX/outX emulation, not
>> really for ordering. Uncached accesses are not re-ordered by the CPU
>> and mf will do just fine as far as ordering of cached accesses are
>> concerned.
Jesse> On our platform, uncached accesses can arrive out of orderif they come
Jesse> from different cpus, so simple spinlock protection isn't quite enough
Jesse> to guarantee that pios don't arrive out of order. We'll need a pio
Jesse> barrier macro and (ideally) an spin_unlock variant that includes it.
Yes, I realize that, but it's not the CPU that's reordering the access
so mf doesn't help and mf.a really doesn't guarantee anything either
(though it may on your platform).
>> Can you get away with forcing the proper ordering with a dummy-read?
>> If so, I suspect that would be preferable as that is the only platform
>> independent way to do this (as far as I know).
Jesse> Do you mean that a driver should do a dummy read following writes that
Jesse> have to arrive prior to others? I guess that would be just as
Jesse> invasive as introducing a pio barrier or another spinlock...
Invasive, yes. But on some platforms there may be no other way of
enforcing order. Perhaps what would be best would be a macro that
takes a device address as an argument. Depending on platform, you
could then do a dummy read from this address or use a special
instruction, such as mf.a, to enforce order.
--david
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
` (4 preceding siblings ...)
2001-12-11 0:44 ` David Mosberger
@ 2001-12-11 0:55 ` Jesse Barnes
2001-12-11 5:30 ` David Mosberger
6 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2001-12-11 0:55 UTC (permalink / raw)
To: linux-ia64
On Mon, Dec 10, 2001 at 04:44:42PM -0800, David Mosberger wrote:
> Yes, I realize that, but it's not the CPU that's reordering the access
> so mf doesn't help and mf.a really doesn't guarantee anything either
> (though it may on your platform).
Yeah, that's too bad. On MIPS we've got 'sync', which is implemented
to do a pio flush. Apparently IA64 doesn't have a nice way to do
something similiar though, so oh well.
> Invasive, yes. But on some platforms there may be no other way of
> enforcing order. Perhaps what would be best would be a macro that
> takes a device address as an argument. Depending on platform, you
> could then do a dummy read from this address or use a special
> instruction, such as mf.a, to enforce order.
Can you think of other platforms that might need a device argument to
a potential pio barrier macro? I was tentatively thinking of
implementing it without any arguments, but I suppose we could just
ignore any arguments for our platform...
Thanks,
Jesse
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] pio barriers
2001-12-10 22:47 [Linux-ia64] pio barriers Jesse Barnes
` (5 preceding siblings ...)
2001-12-11 0:55 ` Jesse Barnes
@ 2001-12-11 5:30 ` David Mosberger
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2001-12-11 5:30 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 10 Dec 2001 16:55:48 -0800, Jesse Barnes <jbarnes@sgi.com> said:
Jesse> On Mon, Dec 10, 2001 at 04:44:42PM -0800, David Mosberger
Jesse> wrote:
>> Yes, I realize that, but it's not the CPU that's reordering the
>> access so mf doesn't help and mf.a really doesn't guarantee
>> anything either (though it may on your platform).
Jesse> Yeah, that's too bad. On MIPS we've got 'sync', which is
Jesse> implemented to do a pio flush. Apparently IA64 doesn't have
Jesse> a nice way to do something similiar though, so oh well.
Well, mf.a *does* do something on the 460 chipset. So I think it's
mostly a platform issue. Are you saying that on SGI's IA-64 platforms
mf.a doesn't do the equivalent of the MIPS sync? (Just curious.)
>> Invasive, yes. But on some platforms there may be no other way
>> of enforcing order. Perhaps what would be best would be a macro
>> that takes a device address as an argument. Depending on
>> platform, you could then do a dummy read from this address or use
>> a special instruction, such as mf.a, to enforce order.
Jesse> Can you think of other platforms that might need a device
Jesse> argument to a potential pio barrier macro? I was tentatively
Jesse> thinking of implementing it without any arguments, but I
Jesse> suppose we could just ignore any arguments for our
Jesse> platform...
Well, my concern is mostly about the x86 platform. I assume x86 NUMA
machines would have the same ordering problem, so if we propose a fix
for the problem, we need to make sure it works on x86 too as otherwise
it will not make it into the official tree.
--david
^ permalink raw reply [flat|nested] 8+ messages in thread