public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] don't udelay() in sn_mmiob
@ 2004-05-26 21:49 Jesse Barnes
  2004-05-27 20:46 ` David Mosberger
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-05-26 21:49 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 261 bytes --]

sn_mmiob is a lightweight way to ensure PCI write ordering, intended to be 
used as an alternative to doing a PIO read.  Unfortunately, with the udelay() 
in there, it ends up being slower than a PCI read on small configurations, so 
remove it.

Thanks,
Jesse


[-- Attachment #2: no-udelay-mmiob.patch --]
[-- Type: text/x-diff, Size: 431 bytes --]

===== arch/ia64/sn/io/machvec/iomv.c 1.8 vs edited =====
--- 1.8/arch/ia64/sn/io/machvec/iomv.c	Tue Sep 16 12:00:45 2003
+++ edited/arch/ia64/sn/io/machvec/iomv.c	Wed May 26 13:49:19 2004
@@ -71,6 +71,6 @@
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
-		udelay(1);
+		;
 }
 EXPORT_SYMBOL(sn_mmiob);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
@ 2004-05-27 20:46 ` David Mosberger
  2004-05-27 21:06 ` Jesse Barnes
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2004-05-27 20:46 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 May 2004 17:49:02 -0400, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> sn_mmiob is a lightweight way to ensure PCI write ordering,
  Jesse> intended to be used as an alternative to doing a PIO read.
  Jesse> Unfortunately, with the udelay() in there, it ends up being
  Jesse> slower than a PCI read on small configurations, so remove it.

Wouldn't you want at least a cpu_relax() in that loop?

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
  2004-05-27 20:46 ` David Mosberger
@ 2004-05-27 21:06 ` Jesse Barnes
  2004-05-27 21:29 ` David Mosberger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jesse Barnes @ 2004-05-27 21:06 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:
> Wouldn't you want at least a cpu_relax() in that loop?

That'll cause the CPU to switch to the other thread if it's SMT so it won't 
spin waiting for the access to complete?  If so, then yes.  Here's an updated 
patch.

sn_mmiob is a lightweight way to ensure PCI write ordering, intended to be 
used as an alternative to doing a PIO read. Unfortunately, with the udelay() 
in there, it ends up being slower than a PCI read on small configurations, so 
remove it in favor of a simple cpu_relax() so that we're HT-friendly.

Thanks,
Jesse

[-- Attachment #2: no-udelay-mmiob.patch --]
[-- Type: text/x-diff, Size: 442 bytes --]

===== arch/ia64/sn/io/machvec/iomv.c 1.8 vs edited =====
--- 1.8/arch/ia64/sn/io/machvec/iomv.c	Tue Sep 16 12:00:45 2003
+++ edited/arch/ia64/sn/io/machvec/iomv.c	Wed May 26 13:49:19 2004
@@ -71,6 +71,6 @@
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
-		udelay(1);
+		cpu_relax();
 }
 EXPORT_SYMBOL(sn_mmiob);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
  2004-05-27 20:46 ` David Mosberger
  2004-05-27 21:06 ` Jesse Barnes
@ 2004-05-27 21:29 ` David Mosberger
  2004-05-28 17:28 ` Jack Steiner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2004-05-27 21:29 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 27 May 2004 17:06:31 -0400, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:

  >> Wouldn't you want at least a cpu_relax() in that loop?

  Jesse> That'll cause the CPU to switch to the other thread if it's
  Jesse> SMT so it won't spin waiting for the access to complete?  If
  Jesse> so, then yes.

It is simply good practice to always use cpu_relax() in busy-loops.
It'll make sure the compiler doesn't get too clever (thanks to the
implied barrier()) and ensures that the CPU doesn't go nuts on the
busy-loop (e.g., a Pentium 4 core might overheat).  In the case
of ia64, cpu_relax() results in a "hint @pause" instruction which
is specified as:

 "Indicates to the processor that the currently executing stream is
  waiting, spinning, or doing low priority tasks.  This hint can be
  used by the processor to allocate more resources or time to another
  executing stream on the same processor."

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
                   ` (2 preceding siblings ...)
  2004-05-27 21:29 ` David Mosberger
@ 2004-05-28 17:28 ` Jack Steiner
  2004-05-28 17:35 ` Jack Steiner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jack Steiner @ 2004-05-28 17:28 UTC (permalink / raw)
  To: linux-ia64

On Thu, May 27, 2004 at 05:06:31PM -0400, Jesse Barnes wrote:
> On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:
> > Wouldn't you want at least a cpu_relax() in that loop?
> 
> That'll cause the CPU to switch to the other thread if it's SMT so it won't 
> spin waiting for the access to complete?  If so, then yes.  Here's an updated 
> patch.

I didn't think the code would actually spin.

sn_mmiob() is something like:

    1:	ld.acq r8=...
    	;;
	cmp.eq p8,p0=r8,r9
   (p8)	br 1b

The "load" is an uncached load. Won't the pipeline stall on the cmp
waiting for data to arrive. I would not have expected that a cpu_relax()
would have been needed here. 

Data takes a few hundred nsec to be loaded from the chipset.


If we were spinning on a CACHED memory address, I agree that cpu_relax() 
is need.



> 
> sn_mmiob is a lightweight way to ensure PCI write ordering, intended to be 
> used as an alternative to doing a PIO read. Unfortunately, with the udelay() 
> in there, it ends up being slower than a PCI read on small configurations, so 
> remove it in favor of a simple cpu_relax() so that we're HT-friendly.
> 
> Thanks,
> Jesse



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
                   ` (3 preceding siblings ...)
  2004-05-28 17:28 ` Jack Steiner
@ 2004-05-28 17:35 ` Jack Steiner
  2004-05-29  0:57 ` Seth, Rohit
  2004-05-29 14:31 ` Jack Steiner
  6 siblings, 0 replies; 8+ messages in thread
From: Jack Steiner @ 2004-05-28 17:35 UTC (permalink / raw)
  To: linux-ia64

On Fri, May 28, 2004 at 12:28:55PM -0500, Jack Steiner wrote:
> On Thu, May 27, 2004 at 05:06:31PM -0400, Jesse Barnes wrote:
> > On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:
> > > Wouldn't you want at least a cpu_relax() in that loop?
> > 
> > That'll cause the CPU to switch to the other thread if it's SMT so it won't 
> > spin waiting for the access to complete?  If so, then yes.  Here's an updated 
> > patch.

Whoops, I posted this response to the wrong mail. I was replying to the question
about whether the code should have a "cpu_relax()".

> 
> I didn't think the code would actually spin.
> 
> sn_mmiob() is something like:
> 
>     1:	ld.acq r8=...
>     	;;
> 	cmp.eq p8,p0=r8,r9
>    (p8)	br 1b
> 
> The "load" is an uncached load. Won't the pipeline stall on the cmp
> waiting for data to arrive. I would not have expected that a cpu_relax()
> would have been needed here. 
> 
> Data takes a few hundred nsec to be loaded from the chipset.
> 
> 
> If we were spinning on a CACHED memory address, I agree that cpu_relax() 
> is need.
> 
> 
> 
> > 
> > sn_mmiob is a lightweight way to ensure PCI write ordering, intended to be 
> > used as an alternative to doing a PIO read. Unfortunately, with the udelay() 
> > in there, it ends up being slower than a PCI read on small configurations, so 
> > remove it in favor of a simple cpu_relax() so that we're HT-friendly.
> > 
> > Thanks,
> > Jesse
> 
> 
> 
> -- 
> Thanks
> 
> Jack Steiner (steiner@sgi.com)          651-683-5302
> Principal Engineer                      SGI - Silicon Graphics, Inc.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
                   ` (4 preceding siblings ...)
  2004-05-28 17:35 ` Jack Steiner
@ 2004-05-29  0:57 ` Seth, Rohit
  2004-05-29 14:31 ` Jack Steiner
  6 siblings, 0 replies; 8+ messages in thread
From: Seth, Rohit @ 2004-05-29  0:57 UTC (permalink / raw)
  To: linux-ia64

Jack Steiner <> wrote on Friday, May 28, 2004 10:35 AM:

> On Fri, May 28, 2004 at 12:28:55PM -0500, Jack Steiner wrote:
>> On Thu, May 27, 2004 at 05:06:31PM -0400, Jesse Barnes wrote:
>>> On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:
>>>> Wouldn't you want at least a cpu_relax() in that loop?
>>> 
>>> That'll cause the CPU to switch to the other thread if it's SMT so
>>> it won't spin waiting for the access to complete?  If so, then yes.
>>> Here's an updated patch.
> 
> Whoops, I posted this response to the wrong mail. I was replying to
> the question about whether the code should have a "cpu_relax()". 
> 
>> 
>> I didn't think the code would actually spin.
>> 
>> sn_mmiob() is something like:
>> 
>>     1:	ld.acq r8=...
>>     	;;
>> 	cmp.eq p8,p0=r8,r9
>>    (p8)	br 1b
>> 
>> The "load" is an uncached load. Won't the pipeline stall on the cmp
>> waiting for data to arrive. I would not have expected that a
>> cpu_relax() would have been needed here.
>> 

You are right that pipeline will stall on the cmp instruction.  But if
you could put the hint@pause between ld and cmp instructions then that
would allow current execution stream to give up resources to other
execution stream (if applicable) for the time that data takes to come
from main memory.

-rohit


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] don't udelay() in sn_mmiob
  2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
                   ` (5 preceding siblings ...)
  2004-05-29  0:57 ` Seth, Rohit
@ 2004-05-29 14:31 ` Jack Steiner
  6 siblings, 0 replies; 8+ messages in thread
From: Jack Steiner @ 2004-05-29 14:31 UTC (permalink / raw)
  To: linux-ia64

On Fri, May 28, 2004 at 05:57:33PM -0700, Seth, Rohit wrote:
> Jack Steiner <> wrote on Friday, May 28, 2004 10:35 AM:
> 
> > On Fri, May 28, 2004 at 12:28:55PM -0500, Jack Steiner wrote:
> >> On Thu, May 27, 2004 at 05:06:31PM -0400, Jesse Barnes wrote:
> >>> On Thursday, May 27, 2004 4:46 pm, David Mosberger wrote:
> >>>> Wouldn't you want at least a cpu_relax() in that loop?
> >>> 
> >>> That'll cause the CPU to switch to the other thread if it's SMT so
> >>> it won't spin waiting for the access to complete?  If so, then yes.
> >>> Here's an updated patch.
> > 
> > Whoops, I posted this response to the wrong mail. I was replying to
> > the question about whether the code should have a "cpu_relax()". 
> > 
> >> 
> >> I didn't think the code would actually spin.
> >> 
> >> sn_mmiob() is something like:
> >> 
> >>     1:	ld.acq r8=...
> >>     	;;
> >> 	cmp.eq p8,p0=r8,r9
> >>    (p8)	br 1b
> >> 
> >> The "load" is an uncached load. Won't the pipeline stall on the cmp
> >> waiting for data to arrive. I would not have expected that a
> >> cpu_relax() would have been needed here.
> >> 
> 
> You are right that pipeline will stall on the cmp instruction.  But if
> you could put the hint@pause between ld and cmp instructions then that
> would allow current execution stream to give up resources to other
> execution stream (if applicable) for the time that data takes to come
> from main memory.
> 
> -rohit

I thought the cpu would automatically switch on a cache miss & I assumed that
a switch would occur on a UC ref. Or am I thinking of a different processor?

If you follow the logic that is cpu_relax is useful in the sn_mmiob()
function, shouldn't there be a cpu_relax() after EVERY uncached load 
that is immediately (or closely) followed by an instruction that 
consumes the data.

What are the conditions that cause a switch between execution streams? 

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-05-29 14:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 21:49 [PATCH] don't udelay() in sn_mmiob Jesse Barnes
2004-05-27 20:46 ` David Mosberger
2004-05-27 21:06 ` Jesse Barnes
2004-05-27 21:29 ` David Mosberger
2004-05-28 17:28 ` Jack Steiner
2004-05-28 17:35 ` Jack Steiner
2004-05-29  0:57 ` Seth, Rohit
2004-05-29 14:31 ` Jack Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox