[Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-03-31  4:11 James Simmons
  2001-04-02 22:44 ` Alan Cox
  0 siblings, 1 reply; 27+ messages in thread
From: James Simmons @ 2001-03-31  4:11 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List, Linux Fbdev development list

>I took to using X, with a single screen size xterm to present the
>illusion of console mode.

Cute trick. I have seen some slow text mode cards. As time goes on it will
get worst since text mode support is not the prime goal anymore. Especially
now that you see graphical BIOS interfaces. Some graphics cards manufactures
have dropped vga text mode support all together. In the next 5 years you
will see the elimination of vga text mode.

>Probably the lack of hardware area copies has something to do with
>this.

Yes this is problem. See my response to Paul about this. The only reason
I'm using MMX for the vesa framebuffer because it has no acceleration. MMX
gives a big boost for genuine intel chips. Other types of MMX are fast but
they don't seemed to be optimized for memory transfers like MMX on intel
chips. I also have regular code that does all kinds of tricks to optimize
data transfers over the bus. It needs more testing but from my comparison
between my voodoo 3 accel engine and this code it ran nearly as fast as
the accelerator at all depths :-)

Another idea for 2.5.X is to implement a font cache in video memory. Even
with AGP it is just to slow to constantly transfer font data over the bus.
Of course this requires a bit of work since we only have so much video
memory but it is worth it for the performance improvement.

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-03-31  4:11 [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?] James Simmons
@ 2001-04-02 22:44 ` Alan Cox
  2001-04-03  6:23   ` Geert Uytterhoeven
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2001-04-02 22:44 UTC (permalink / raw)
  To: James Simmons
  Cc: Jamie Lokier, Linux Kernel Mailing List,
	Linux Fbdev development list

> Yes this is problem. See my response to Paul about this. The only reason
> I'm using MMX for the vesa framebuffer because it has no acceleration. MMX
> gives a big boost for genuine intel chips. Other types of MMX are fast but
> they don't seemed to be optimized for memory transfers like MMX on intel
> chips. I also have regular code that does all kinds of tricks to optimize

Then you are doing something badly wrong.

The MMX memcpy for CyrixIII and Athlon boxes is something like twice the
speed of rep movs. On most pentium II/III boxes the fast paths for rep movs
and for MMX are the same speed

Alan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-02 22:44 ` Alan Cox
@ 2001-04-03  6:23   ` Geert Uytterhoeven
  2001-04-03 12:21     ` Alan Cox
  0 siblings, 1 reply; 27+ messages in thread
From: Geert Uytterhoeven @ 2001-04-03  6:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: James Simmons, Jamie Lokier, Linux Kernel Mailing List,
	Linux Fbdev development list

On Mon, 2 Apr 2001, Alan Cox wrote:
> > Yes this is problem. See my response to Paul about this. The only reason
> > I'm using MMX for the vesa framebuffer because it has no acceleration. MMX
> > gives a big boost for genuine intel chips. Other types of MMX are fast but
> > they don't seemed to be optimized for memory transfers like MMX on intel
> > chips. I also have regular code that does all kinds of tricks to optimize
> 
> Then you are doing something badly wrong.
> 
> The MMX memcpy for CyrixIII and Athlon boxes is something like twice the
> speed of rep movs. On most pentium II/III boxes the fast paths for rep movs
> and for MMX are the same speed

As long as you are copying in real memory. So the PCI bus or the host bridge
implementation may be the actual limit.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-03  6:23   ` Geert Uytterhoeven
@ 2001-04-03 12:21     ` Alan Cox
  2001-04-04  7:50       ` Eric W. Biederman
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2001-04-03 12:21 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alan Cox, James Simmons, Jamie Lokier, Linux Kernel Mailing List,
	Linux Fbdev development list

> > The MMX memcpy for CyrixIII and Athlon boxes is something like twice the
> > speed of rep movs. On most pentium II/III boxes the fast paths for rep movs
> > and for MMX are the same speed
> 
> As long as you are copying in real memory. So the PCI bus or the host bridge
> implementation may be the actual limit.

The CyrixIII sits on the same host bridges as the intel processors

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-03 12:21     ` Alan Cox
@ 2001-04-04  7:50       ` Eric W. Biederman
  2001-04-04  8:47         ` Jamie Lokier
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2001-04-04  7:50 UTC (permalink / raw)
  To: Alan Cox
  Cc: Geert Uytterhoeven, James Simmons, Jamie Lokier,
	Linux Kernel Mailing List, Linux Fbdev development list

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > > The MMX memcpy for CyrixIII and Athlon boxes is something like twice the
> > > speed of rep movs. On most pentium II/III boxes the fast paths for rep movs
> > > and for MMX are the same speed
> > 
> > As long as you are copying in real memory. So the PCI bus or the host bridge
> > implementation may be the actual limit.
> 
> The CyrixIII sits on the same host bridges as the intel processors

I don't know if it applies to this case but one thing I have seen make
a noticeable difference is whether or not write-combining is enabled.
If we have only be enabling MTRR's for intel this could do account
for it.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-04  7:50       ` Eric W. Biederman
@ 2001-04-04  8:47         ` Jamie Lokier
  0 siblings, 0 replies; 27+ messages in thread
From: Jamie Lokier @ 2001-04-04  8:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Geert Uytterhoeven, James Simmons,
	Linux Kernel Mailing List, Linux Fbdev development list

Eric W. Biederman wrote:
> I don't know if it applies to this case but one thing I have seen make
> a noticeable difference is whether or not write-combining is enabled.
> If we have only be enabling MTRR's for intel this could do account
> for it.

And on some laptops, even on Intel MTRRs are not enabled for 2.5M
framebuffers.

-- Jamie

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-04-05  3:04 James Simmons
  2001-04-05 12:03 ` Eric W. Biederman
  0 siblings, 1 reply; 27+ messages in thread
From: James Simmons @ 2001-04-05  3:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Cox, Linux Fbdev development list, Linux Kernel Mailing List

>>> As long as you are copying in real memory. So the PCI bus or the host
bridge
>>> implementation may be the actual limit.
>>
>> The CyrixIII sits on the same host bridges as the intel processors
>
>I don't know if it applies to this case but one thing I have seen make
>a noticeable difference is whether or not write-combining is enabled.
>If we have only be enabling MTRR's for intel this could do account
>for it.

I think what Geert was trying to point out is does MTRR perform was well
with normal memory over bus to video memory transfers as compared to
normal memory to normal memory transfers. MTTRs might not be optimzed for
these kinds of transfers. I honestly can't say since I haven't tried it. I
brought the MMX book home from works so I'm going to be experimenting
with it this weekend to find out. I really like to compare the MMX
performance to the word aligned transfers over the bus I have going. I had
a bug in my soft accel code that prevented word alignment. Once I fixed
that bug I seen a 10 fold improvement in rendering on the framebuffer.
I'm not kidding about that improvement either :-)

MTTRs enabled always makes a difference. I liek to try it with and
without. I will do some benchmarkings.

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05  3:04 James Simmons
@ 2001-04-05 12:03 ` Eric W. Biederman
  2001-04-05 12:12   ` Geert Uytterhoeven
  0 siblings, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2001-04-05 12:03 UTC (permalink / raw)
  To: James Simmons
  Cc: Alan Cox, Linux Fbdev development list, Linux Kernel Mailing List

James Simmons <jsimmons@linux-fbdev.org> writes:

> >>> As long as you are copying in real memory. So the PCI bus or the host
> bridge
> >>> implementation may be the actual limit.
> >>
> >> The CyrixIII sits on the same host bridges as the intel processors
> >
> >I don't know if it applies to this case but one thing I have seen make
> >a noticeable difference is whether or not write-combining is enabled.
> >If we have only be enabling MTRR's for intel this could do account
> >for it.
> 
> I think what Geert was trying to point out is does MTRR perform was well
> with normal memory over bus to video memory transfers as compared to
> normal memory to normal memory transfers. MTTRs might not be optimzed for
> these kinds of transfers. I honestly can't say since I haven't tried it. I
> brought the MMX book home from works so I'm going to be experimenting
> with it this weekend to find out. I really like to compare the MMX
> performance to the word aligned transfers over the bus I have going. I had
> a bug in my soft accel code that prevented word alignment. Once I fixed
> that bug I seen a 10 fold improvement in rendering on the framebuffer.
> I'm not kidding about that improvement either :-)
> 
> MTTRs enabled always makes a difference. I liek to try it with and
> without. I will do some benchmarkings.

While I'm thinking about it what we really should be using is the PAT
extension and not MTRR's.  The PAT extension allows you to set the
attributes per page so you don't have the resource contention you do
with MTRR's.  I can just imagine the performance challenges right now
if you try to do a multi-head where multi > number of free MTRR's.

What happens with write-combining is active is that close adjacent
writes are batched together.  Without write-combining you tend to get
32bit writes on a bus with a word size of 64 or more bits.  By the way
does anyone know who didn't implement MTRR's or the equivalent on
alpha so we can shoot them?

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05 12:03 ` Eric W. Biederman
@ 2001-04-05 12:12   ` Geert Uytterhoeven
  2001-04-05 13:15     ` Maciej W. Rozycki
  0 siblings, 1 reply; 27+ messages in thread
From: Geert Uytterhoeven @ 2001-04-05 12:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Simmons, Alan Cox, Linux Fbdev development list,
	Linux Kernel Mailing List

On 5 Apr 2001, Eric W. Biederman wrote:
> 32bit writes on a bus with a word size of 64 or more bits.  By the way
> does anyone know who didn't implement MTRR's or the equivalent on
> alpha so we can shoot them?

People never get shot in Open Source projects. Not when they write buggy code,
not when they don't implement some features.

Gr{oetje,eeting}s,

						Geert

P.S. Perhaps ESR tends to disagree? ;-)
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05 12:12   ` Geert Uytterhoeven
@ 2001-04-05 13:15     ` Maciej W. Rozycki
  2001-04-05 18:20       ` Eric W. Biederman
  0 siblings, 1 reply; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-05 13:15 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Eric W. Biederman, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Thu, 5 Apr 2001, Geert Uytterhoeven wrote:

> > 32bit writes on a bus with a word size of 64 or more bits.  By the way
> > does anyone know who didn't implement MTRR's or the equivalent on
> > alpha so we can shoot them?
> 
> People never get shot in Open Source projects. Not when they write buggy code,
> not when they don't implement some features.

 Was DEC Alpha an Open Source project? ;-)

 Memory barriers are more RISC-styled and more flexible anyway (e.g. you
can't run out of them ;-) ), though they require a greater care when
writing code.  MTRRs are the Intel style of complicating designs.  Still
they are probably a reasonable solution to preserve DOS compatibility. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05 13:15     ` Maciej W. Rozycki
@ 2001-04-05 18:20       ` Eric W. Biederman
  2001-04-06 10:09         ` Ivan Kokshaysky
  2001-04-06 17:07         ` Maciej W. Rozycki
  0 siblings, 2 replies; 27+ messages in thread
From: Eric W. Biederman @ 2001-04-05 18:20 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

"Maciej W. Rozycki" <macro@ds2.pg.gda.pl> writes:

> On Thu, 5 Apr 2001, Geert Uytterhoeven wrote:
> 
> > > 32bit writes on a bus with a word size of 64 or more bits.  By the way
> > > does anyone know who didn't implement MTRR's or the equivalent on
> > > alpha so we can shoot them?
> > 
> > People never get shot in Open Source projects. Not when they write buggy code,
> 
> > not when they don't implement some features.
> 
>  Was DEC Alpha an Open Source project? ;-)
> 
>  Memory barriers are more RISC-styled and more flexible anyway (e.g. you
> can't run out of them ;-) ), though they require a greater care when
> writing code.  MTRRs are the Intel style of complicating designs.  Still
> they are probably a reasonable solution to preserve DOS compatibility. 

The point is on the Alpha all ram is always cached, and i/o space is
completely uncached.  You cannot do write-combing for video card
memory.  Memory barriers are a separate issue.  On the alpha the
natural way to implement it would be in the page table fill code.
Memory barriers are o.k. but the really don't help the case when what
you want to do is read the latest value out of a pci register.  

Eric




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05 18:20       ` Eric W. Biederman
@ 2001-04-06 10:09         ` Ivan Kokshaysky
  2001-04-06 13:19           ` Eric W. Biederman
  2001-04-06 17:13           ` Maciej W. Rozycki
  2001-04-06 17:07         ` Maciej W. Rozycki
  1 sibling, 2 replies; 27+ messages in thread
From: Ivan Kokshaysky @ 2001-04-06 10:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Maciej W. Rozycki, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Thu, Apr 05, 2001 at 12:20:22PM -0600, Eric W. Biederman wrote:
> The point is on the Alpha all ram is always cached, and i/o space is
> completely uncached.  You cannot do write-combing for video card
> memory.

Incorrect. Alphas have write buffers - 6x32 bytes on ev5 and
4x64 on ev6, IIRC. So alphas do write up to 32 or 64 bytes
in a single pci transaction.

>  Memory barriers are a separate issue.  On the alpha the
> natural way to implement it would be in the page table fill code.
> Memory barriers are o.k. but the really don't help the case when what
> you want to do is read the latest value out of a pci register.  

You don't need memory barrier for that. "Write memory barriers" are
used to ensure correct write order, and "memory barriers" are used
to ensure that all pending reads/writes will complete before next read
or write.

Ivan.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 10:09         ` Ivan Kokshaysky
@ 2001-04-06 13:19           ` Eric W. Biederman
  2001-04-06 17:27             ` Maciej W. Rozycki
  2001-04-06 17:13           ` Maciej W. Rozycki
  1 sibling, 1 reply; 27+ messages in thread
From: Eric W. Biederman @ 2001-04-06 13:19 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

Ivan Kokshaysky <ink@jurassic.park.msu.ru> writes:

> On Thu, Apr 05, 2001 at 12:20:22PM -0600, Eric W. Biederman wrote:
> > The point is on the Alpha all ram is always cached, and i/o space is
> > completely uncached.  You cannot do write-combing for video card
> > memory.
> 
> Incorrect. Alphas have write buffers - 6x32 bytes on ev5 and
> 4x64 on ev6, IIRC. So alphas do write up to 32 or 64 bytes
> in a single pci transaction.

Sorry I was thinking the current alpha the ev6.  So what I'm saying
doesn't apply to the alpha architecture in general just it's current
specific implementation.   

Yes for the ev6 you have write buffers but can't say just use the
write buffers,  on an arbitrary area of memory. 

> >  Memory barriers are a separate issue.  On the alpha the
> > natural way to implement it would be in the page table fill code.
> > Memory barriers are o.k. but the really don't help the case when what
> > you want to do is read the latest value out of a pci register.  
> 
> You don't need memory barrier for that. "Write memory barriers" are
> used to ensure correct write order, and "memory barriers" are used
> to ensure that all pending reads/writes will complete before next read
> or write.

100% Agreed. That is what I was saying.  What the ev6 doesn't have
is the ability to say this: I am using this area of the memory address
space in a particular way: don't cache it but do write combing on it.

Theoretically you could use memory barrier instructions for this but
it would require an I/O bus that supported a cache coherency
protocol.  At which point the problem moves down to your PCI bus
controller.

I recall on the ev6 all memory accesses to locations with bit 40 set 
are always to IO space are never cached and are never write buffered.
Accesses to memory locations with bit 40 clear are always to RAM are
always cached and always write buffered. 

With the high I/O bus speeds unless you are trying to push things to
the absolute limit you are unlikely to see the IO accesses being the
bottleneck in or out to a PCI device.  At which point DMA probably
already compensates, for most devices.

IIRC For PCI card IO regions where you need maximum IO speed through
the memory address space (like frame buffers) the ev6 falls down.

I really like the alpha this is why this gals me so much about the
ev6.  I hope they have it fixed for the ev7 or ev8.  If those chips
ever actually arrive.  But as the ev7 is just supposed to be the ev6
core with an on chip cache I don't have much hope.

Eric

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 13:19           ` Eric W. Biederman
@ 2001-04-06 17:27             ` Maciej W. Rozycki
  2001-04-06 18:34               ` Andrea Arcangeli
  0 siblings, 1 reply; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-06 17:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ivan Kokshaysky, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On 6 Apr 2001, Eric W. Biederman wrote:

> I recall on the ev6 all memory accesses to locations with bit 40 set 
> are always to IO space are never cached and are never write buffered.

 If that is the case then EV6 is seriously flawed.  You normally have
non-cached locations buffered (since you don't always need peripheral
device accesses to be posted immediately) and can force a writeback with a
memory barrier.  I don't have my 21264 handbook handy, so I can't check
EV6 details at the moment, especially why it is different.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 17:27             ` Maciej W. Rozycki
@ 2001-04-06 18:34               ` Andrea Arcangeli
  2001-04-06 19:31                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 27+ messages in thread
From: Andrea Arcangeli @ 2001-04-06 18:34 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Eric W. Biederman, Ivan Kokshaysky, Geert Uytterhoeven,
	James Simmons, Alan Cox, Linux Fbdev development list,
	Linux Kernel Mailing List

On Fri, Apr 06, 2001 at 07:27:24PM +0200, Maciej W. Rozycki wrote:
> [..] You normally have
> non-cached locations buffered (since you don't always need peripheral
> device accesses to be posted immediately) and can force a writeback with a
> memory barrier. [..]

ev6 works the way you described AFIK (to flush the write buffer you can use
wmb(), note that wmb() semantics doesn't require the cpu to really "flush" but
just to keep writes oredered across other mb or wmb, but it's basically the
same from a software point of you and flushing the write buffer synchronously
obviously provides that semantics).  I didn't followed very closely the
previous part of the thread so I'm not sure what is the issue.

Andrea

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 18:34               ` Andrea Arcangeli
@ 2001-04-06 19:31                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-06 19:31 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Eric W. Biederman, Ivan Kokshaysky, Geert Uytterhoeven,
	James Simmons, Alan Cox, Linux Fbdev development list,
	Linux Kernel Mailing List

On Fri, 6 Apr 2001, Andrea Arcangeli wrote:

> ev6 works the way you described AFIK (to flush the write buffer you can use

 Thanks for the clarification -- you made me calm down.

> wmb(), note that wmb() semantics doesn't require the cpu to really "flush" but
> just to keep writes oredered across other mb or wmb, but it's basically the
> same from a software point of you and flushing the write buffer synchronously
> obviously provides that semantics).  I didn't followed very closely the

 Of course -- you only want to do mb (and not wmb) if you need to meet
hw's specific timing or you want to perform a read from a volatile
register of a peripheral device. 

> previous part of the thread so I'm not sure what is the issue.

 Someone complained of Alpha not having Intel-style MTRRs to set write
combining for fb memory...

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 10:09         ` Ivan Kokshaysky
  2001-04-06 13:19           ` Eric W. Biederman
@ 2001-04-06 17:13           ` Maciej W. Rozycki
  2001-04-08 18:11             ` Ivan Kokshaysky
  1 sibling, 1 reply; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-06 17:13 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Eric W. Biederman, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Fri, 6 Apr 2001, Ivan Kokshaysky wrote:

> >  Memory barriers are a separate issue.  On the alpha the
> > natural way to implement it would be in the page table fill code.
> > Memory barriers are o.k. but the really don't help the case when what
> > you want to do is read the latest value out of a pci register.  
> 
> You don't need memory barrier for that. "Write memory barriers" are
> used to ensure correct write order, and "memory barriers" are used
> to ensure that all pending reads/writes will complete before next read
> or write.

 You do.  PCI-space registers are volatile and they may change depending
on what was written (or read) previously.  A memory barrier before a PCI
read will ensure you get a value that is relevant to previous code
actions.  Without a barrier you may get pretty anything, depending on
which of previous writes managed to complete before. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-06 17:13           ` Maciej W. Rozycki
@ 2001-04-08 18:11             ` Ivan Kokshaysky
  2001-04-09 10:02               ` Maciej W. Rozycki
  0 siblings, 1 reply; 27+ messages in thread
From: Ivan Kokshaysky @ 2001-04-08 18:11 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Eric W. Biederman, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Fri, Apr 06, 2001 at 07:13:21PM +0200, Maciej W. Rozycki wrote:
>  You do.  PCI-space registers are volatile and they may change depending
> on what was written (or read) previously.  A memory barrier before a PCI
> read will ensure you get a value that is relevant to previous code
> actions.  Without a barrier you may get pretty anything, depending on
> which of previous writes managed to complete before. 

Of course. I meant that if you are reading, for example, some status register
in a loop waiting for "ready bit" set, the memory barrier won't help you
to notice this event any faster. Actually you'll notice that *later*, as
"mb" is expensive.

Well, here is some info on ev6 IO write buffers - they are a bit different
than ev4/ev5 ones.
Merging rules:
 - byte/word stores aren't allowed to merge into a write buffer;
 - different size stores (32- and 64-bit) aren't allowed to merge;
 - addresses must be in ascending order and non-overlapping,
   but not necessarily consecutive.
The I/O register merge window close (ie write-buffer flushing) occurs after
 - mb and wmb instructions;
 - IO-space load instruction (!);
 - after 1024 cycles if there were no IO-space stores.
Store requests are sent offchip in program order (!).

All this explains, in particular, why XFree86-4.0 worked on ev6 without
memory barriers of any kind, while it crashed badly on ev4/ev5.

Ivan.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-08 18:11             ` Ivan Kokshaysky
@ 2001-04-09 10:02               ` Maciej W. Rozycki
  2001-04-09 11:05                 ` Ivan Kokshaysky
  0 siblings, 1 reply; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-09 10:02 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Eric W. Biederman, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Sun, 8 Apr 2001, Ivan Kokshaysky wrote:

> Of course. I meant that if you are reading, for example, some status register
> in a loop waiting for "ready bit" set, the memory barrier won't help you
> to notice this event any faster. Actually you'll notice that *later*, as
> "mb" is expensive.

 I think you need an mb here.  To force sychronization with other CPUs.
Unless you know you are UP or there is no possibility another CPU may
access the relevant device.

 Of course mbs hit performance but it's a trade off for coherency. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-09 10:02               ` Maciej W. Rozycki
@ 2001-04-09 11:05                 ` Ivan Kokshaysky
  0 siblings, 0 replies; 27+ messages in thread
From: Ivan Kokshaysky @ 2001-04-09 11:05 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Eric W. Biederman, Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On Mon, Apr 09, 2001 at 12:02:54PM +0200, Maciej W. Rozycki wrote:
>  I think you need an mb here.  To force sychronization with other CPUs.
> Unless you know you are UP or there is no possibility another CPU may
> access the relevant device.

Yes - in most cases you need synchronization at a higher level.
For instance, you don't want other CPUs accessing the device while
you are sending command sequences to it.

Ivan.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-05 18:20       ` Eric W. Biederman
  2001-04-06 10:09         ` Ivan Kokshaysky
@ 2001-04-06 17:07         ` Maciej W. Rozycki
  1 sibling, 0 replies; 27+ messages in thread
From: Maciej W. Rozycki @ 2001-04-06 17:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Geert Uytterhoeven, James Simmons, Alan Cox,
	Linux Fbdev development list, Linux Kernel Mailing List

On 5 Apr 2001, Eric W. Biederman wrote:

> The point is on the Alpha all ram is always cached, and i/o space is
> completely uncached.  You cannot do write-combing for video card

 You don't want to cache fb memory, do you?  All you want is write
combining and you achieve it with memory barriers.  You write to fb memory
space whatever you need to and write buffers actually deliver data to fb
memory whenever the bus is idle or they get filled up.  When you finally
decide you wrote all data and you want ensure it actually reaches the fb
memory before you perform an operation (say you send a command to fb's
support circuitry) you issue a write memory barrier.  Or a memory barrier,
if you want ensure the data reaches the fb memory ASAP.

 In other words, you have write-combining by default and request
write-through explicitly.

> memory.  Memory barriers are a separate issue.  On the alpha the
> natural way to implement it would be in the page table fill code.

 Please forgive me -- I can't see how this is related to write combining.

> Memory barriers are o.k. but the really don't help the case when what
> you want to do is read the latest value out of a pci register.  

 They do -- you issue an mb and you are sure all pending writes reached
the involved PCI hw. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-04-03  2:04 James Simmons
  0 siblings, 0 replies; 27+ messages in thread
From: James Simmons @ 2001-04-03  2:04 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List, Linux Fbdev development list

>Is it possible that "jump scroll" would provide more performance benefit
>than an accelerated driver anyway?

I wouldn't rule it out. If someone wants to wipe up some code I would have
no problem testing it to see if it is worth it.

>Seeing as you bring up this topic of writing a 9525 driver.  It seems to
>me rather wasteful that you (collectively linux framebuffer authors),
>XFree86 and Berlin are all writing drivers for the same, hugely diverse
>class of hardware, to support more or less the same ops on the hardware.
>
>Isn't possible to pool the development effort of video drivers?  Doesn't
>X require basically the same set of operations as the kernel?  I.e.,
>initialise the card and video mode (usually the very complex part); do
>some rendering ops (usually fairly simple).  Sure, X provides a few more
>kinds of rendering op, but that part of the code is usually much simpler
>and smaller than the initialisation code.

Well the goal of each is very much different. Fbcon was developed to deal
the fact that most modern video hardware doesn't support text but graphical
based modes instead. VGA text is slowly going away. Since are goal is to
emulate a text console we just have to provide basic support to provide
just this. We need to

1) Draw basic text -> Glyph operations.

2) scrolling -> hardware panning or a copy area operation.

3) scroll a region of the screen -> copy area operation.

4) Clear the display or region of display -> fillrect

5) Set color palette.

6) Manage a hardware cursor.

7) Manage the current resolution for VC switching or a mode change vi
   VT_RESIZE or TIOCSWINSZ.

So fbcon is out of necessite. Now X you mean XFree86 which is really a OS
in itself. Its goal to do everything itself so it can run everywhere
know to mankind. As for Berlin I don't know the code so I can't say.
As people are finding out XFree86 doing everything itself is having
issues. A good example is the classic problem of X dying and you have to
reboot the machine. Also when under heavy load and you exit X to the
console you don't get the text mode. Well right now its tough luck and
just reboot your machine. A M$ solution but people have been doing it
so long they don't mind it. I hope to fix those problems for 2.5.X.
As you can see I think the OS should handle the transfer from console mode
to text mode and vice versa. Now for programming the accel engine to do
graphics in userland. Well their is nothing wrong that each does their own
thing. What does matter is their is a GIU independent kernel manager of
the graphics engine state. DRI attempts to handle this.

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-04-01 14:54 James Simmons
  2001-04-01 21:35 ` Jamie Lokier
  0 siblings, 1 reply; 27+ messages in thread
From: James Simmons @ 2001-04-01 14:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List, Linux Fbdev development list


>No, it's the Trident Cyber9525

Sorry. I only have a early driver for trident 9750 and 9850. Their is a
gropup working on trident framebuffers.

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-04-01 14:54 James Simmons
@ 2001-04-01 21:35 ` Jamie Lokier
  0 siblings, 0 replies; 27+ messages in thread
From: Jamie Lokier @ 2001-04-01 21:35 UTC (permalink / raw)
  To: James Simmons; +Cc: Linux Kernel Mailing List, Linux Fbdev development list

James Simmons wrote:
> >No, it's the Trident Cyber9525
> 
> Sorry. I only have a early driver for trident 9750 and 9850. Their is a
> gropup working on trident framebuffers.

Is it possible that "jump scroll" would provide more performance benefit
than an accelerated driver anyway?

Seeing as you bring up this topic of writing a 9525 driver.  It seems to
me rather wasteful that you (collectively linux framebuffer authors),
XFree86 and Berlin are all writing drivers for the same, hugely diverse
class of hardware, to support more or less the same ops on the hardware.

Isn't possible to pool the development effort of video drivers?  Doesn't
X require basically the same set of operations as the kernel?  I.e.,
initialise the card and video mode (usually the very complex part); do
some rendering ops (usually fairly simple).  Sure, X provides a few more
kinds of rendering op, but that part of the code is usually much simpler
and smaller than the initialisation code.

Sorry if this sounds insulting -- it isn't intended that way.  I don't
really know what is involved in writing video drivers.  All I am seeing
is an _apparent_ reinventing of a rather complex wheel, when it's hard
enough as it is to keep up with all the different cards.

thanks,
-- Jamie

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-03-31  4:15 James Simmons
  0 siblings, 0 replies; 27+ messages in thread
From: James Simmons @ 2001-03-31  4:15 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List, Linux Fbdev development list

>The console driver does not actually use 2.5MB.  Does it make sense to
>use an MTRR for the smaller power-of-two region?

If we implement a font cache in the future it could. Also that extra
memory is used to allow scrollback. We could break up the size of the
region. Have it a*2^n+b*2^(n-1)+c*2^(n-2)+... = 2.5 MB. Isn't math grand
:-)

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
@ 2001-03-31  3:47 James Simmons
  2001-03-31 14:15 ` Jamie Lokier
  0 siblings, 1 reply; 27+ messages in thread
From: James Simmons @ 2001-03-31  3:47 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Linux Kernel Mailing List, Linux Fbdev development list


> > You have same toshiba satellite as me, right?
>
> Yes

Is this the NeoMagic chipset?

MS: (n) 1. A debilitating and surprisingly widespread affliction that
renders the sufferer barely able to perform the simplest task. 2. A disease.

James Simmons  [jsimmons@linux-fbdev.org]               ____/|
fbdev/console/gfx developer                             \ o.O|
http://www.linux-fbdev.org                               =(_)=
http://linuxgfx.sourceforge.net                            U
http://linuxconsole.sourceforge.net


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?]
  2001-03-31  3:47 James Simmons
@ 2001-03-31 14:15 ` Jamie Lokier
  0 siblings, 0 replies; 27+ messages in thread
From: Jamie Lokier @ 2001-03-31 14:15 UTC (permalink / raw)
  To: James Simmons; +Cc: Linux Kernel Mailing List, Linux Fbdev development list

James Simmons wrote:
> > > You have same toshiba satellite as me, right?
> >
> > Yes
> 
> Is this the NeoMagic chipset?

No, it's the Trident Cyber9525

-- Jamie

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2001-04-09 11:41 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-31  4:11 [Linux-fbdev-devel] Re: fbcon slowness [was NTP on 2.4.2?] James Simmons
2001-04-02 22:44 ` Alan Cox
2001-04-03  6:23   ` Geert Uytterhoeven
2001-04-03 12:21     ` Alan Cox
2001-04-04  7:50       ` Eric W. Biederman
2001-04-04  8:47         ` Jamie Lokier
  -- strict thread matches above, loose matches on Subject: below --
2001-04-05  3:04 James Simmons
2001-04-05 12:03 ` Eric W. Biederman
2001-04-05 12:12   ` Geert Uytterhoeven
2001-04-05 13:15     ` Maciej W. Rozycki
2001-04-05 18:20       ` Eric W. Biederman
2001-04-06 10:09         ` Ivan Kokshaysky
2001-04-06 13:19           ` Eric W. Biederman
2001-04-06 17:27             ` Maciej W. Rozycki
2001-04-06 18:34               ` Andrea Arcangeli
2001-04-06 19:31                 ` Maciej W. Rozycki
2001-04-06 17:13           ` Maciej W. Rozycki
2001-04-08 18:11             ` Ivan Kokshaysky
2001-04-09 10:02               ` Maciej W. Rozycki
2001-04-09 11:05                 ` Ivan Kokshaysky
2001-04-06 17:07         ` Maciej W. Rozycki
2001-04-03  2:04 James Simmons
2001-04-01 14:54 James Simmons
2001-04-01 21:35 ` Jamie Lokier
2001-03-31  4:15 James Simmons
2001-03-31  3:47 James Simmons
2001-03-31 14:15 ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox