linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* recent fixes in devel tree
@ 2000-09-16  5:04 Paul Mackerras
  2000-09-16 11:16 ` Holger Bettag
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2000-09-16  5:04 UTC (permalink / raw)
  To: linuxppc-dev


I've been fixing a few things in the development kernel (2.4.0-test8)
lately:

PCMCIA.  I have fixed an endianness bug in the pcmcia/cardbus support
in the kernel, and added some code to get the interrupt routing right
on my 1999 G3 powerbook ("lombard").  I also fixed a bad bug that we
had in our free_irq() implementation.  I have been testing with a 3com
combo ethernet/modem card which seems to work just fine now, although
there does seem to be some sort of a problem on the initial card
insertion still.

Machine checks on accesses to non-existent I/O ports on powermacs.
I changed the definitions of inb/w/l and outb/w/l so that they use a
sync instruction after the load or store, instead of eieio.  If the
location doesn't respond and we get a machine check, it should occur
at the sync instruction and we then use the existing exception
handling mechanism that we use for get_user/put_user etc. to recover
and keep going.  An inb/w/l to a non-existent port will return -1
rather than crashing the system as before.

As part of this I added code to sort the exception table.  Because we
have so many sections (pmac, prep, chrp, apus, etc. as well as the
usual init section and the main text section) the exception table gets
out of order and it is necessary to sort it so that the binary search
works correctly.

Copy speedups using cache prefetching.  I found that on the G4, using
the dcbt instruction to prefetch data into the cache in the inner loop
of copy_to/from_user and copy_page gives very substantial speedups.
For example, `cat largefile >/dev/null' used to go at around 140MB/s
on my 450MHz G4 cube (assuming largefile fits into memory) and now it
goes at around 400MB/s. :-)  Interestingly, dcbt makes no difference
at all on the G3 machines I tried.  I presume it is a no-op on the G3.

The fixes are in the linuxppc_2_3 bk tree and in my rsync tree at
ppc.samba.org::linux-pmac-devel.  Cort or I will send some patches to
Linus soon and hopefully they will go in.

Paul.

--
Paul Mackerras, Senior Open Source Researcher, Linuxcare, Inc.
+61 2 6262 8990 tel, +61 2 6262 8991 fax
paulus@linuxcare.com.au, http://www.linuxcare.com.au/
Linuxcare.  Support for the revolution.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
  2000-09-16  5:04 Paul Mackerras
@ 2000-09-16 11:16 ` Holger Bettag
  2000-09-16 14:25   ` Dan Malek
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Bettag @ 2000-09-16 11:16 UTC (permalink / raw)
  To: linuxppc-dev


Paul Mackerras <paulus@linuxcare.com.au> writes:

[...]
> Copy speedups using cache prefetching.  I found that on the G4,
> using the dcbt instruction to prefetch data into the cache in the
> inner loop of copy_to/from_user and copy_page gives very substantial
> speedups.  For example, `cat largefile >/dev/null' used to go at
> around 140MB/s on my 450MHz G4 cube (assuming largefile fits into
> memory) and now it goes at around 400MB/s. :-) Interestingly, dcbt
> makes no difference at all on the G3 machines I tried.  I presume it
> is a no-op on the G3.
>
It shouldn't be a no-op on the G3, although the cache touch
instructions can be "turned off" globally by a bit in some HID
register.

There is one more quirk concerning memory bandwidth of the G4:
prefetching operations (both dcbt and dst) seem to be treated more
favourably by the bus interface than fetches generated by actual load
instructions, i.e. you get higher bandwidth by using cache touch
instructions even if the code consists of an endless sequence of, say,
vector loads, which should easily consume all available bandwidth on
their own.

I assume that the G4 does use pipelined, split transactions on the
MPX bus for prefetches, but not for actual loads. Maybe this is a
tradeoff between latency and throughput, or maybe it is somehow
related to a known bug in the G4's implementation of the MPX
protocol (namely, the number of outstanding transactions must be
limited to four or five, instead of the specified six).

  Holger

P.S.: BTW, is there a general consensus wether or not AltiVec
      enhancements in the kernel would be a good thing or too much
      hassle, or of interest for too few people, etc.? I don't think
      that the currently available patched gcc is reliable enough yet,
      but one day it might be.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
  2000-09-16 11:16 ` Holger Bettag
@ 2000-09-16 14:25   ` Dan Malek
  2000-09-17 16:01     ` Holger Bettag
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Malek @ 2000-09-16 14:25 UTC (permalink / raw)
  To: Holger Bettag; +Cc: linuxppc-dev


Holger Bettag wrote:

> P.S.: BTW, is there a general consensus wether or not AltiVec
>       enhancements in the kernel would be a good thing.....


What kind of enhancements?  I do a fair amount of audio/video
processing on my G4 with the current gcc/Altivec tools.  I found only
one minor bug in the exception handler that I fixed a while back.

We have been discussing some of the vector load/store/touch as both
enhancements to the kernel and glibc.  It is a little more challenging
in the kernel because we would have to then keep some of the vector
context around for the kernel, but in a very constrained case we could
probably get some speed ups with that.


	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
       [not found] <20000916152035.A517@olis.north.de>
@ 2000-09-16 18:05 ` Benjamin Herrenschmidt
  2000-09-16 21:51   ` Claus Enneper
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2000-09-16 18:05 UTC (permalink / raw)
  To: cl.en, linuxppc-dev


> >:
> >: - On my "lombard", the machine doesn't wakeup from sleep.  When I awaken
> >:   it, it just shuts off.

I'll move some sleep fixes I have in my tree to 2.4 soon (still a few
things to fix).

>just for the record:
>same story with a fresh 2.2.18pre4-ben1
>
> kernel: Unable to handle kernel NULL pointer dereference at virtual
>address 00000024 (error 20000000)
> kernel: NIP: C00CF34C XER: C000BE6F LR: C00CFA88 REGS: c4c5fcb0 TRAP: 0300
> kernel: MSR: 00009032 [EEIRDRME]
> kernel: TASK = c4c5e000[299] 'pmud' mm->pgd c4cbe000 Last syscall: 54
> kernel: GPR00: C00CFA88 C4C5FD60 C4C5E000 C01CA7D8 00000000
>C01492F8 00004000 C01BF250
> kernel: GPR08: 00000000 00000000 F3000020 C045AAA0 55553535 1001E8BC
>00000000 100A2B10
> kernel: GPR16: 00000000 DEADBEEF 00000000 00000000 00009032 04C5FE80
>C01CABB4 C01CA650
> kernel: GPR24: C0180000 C01CABB0 000
>00002 00418570 00000008 00000001 00000000 C01CA7D8
> kernel: Call backtrace:
> kernel: C0179E2C C00CFA88 C00F2BCC C0185B00 C00F2FAC C003C310 C000392
> kernel: 0FF8D9A4 1000302C 100015D4 0FF0B69C 00000000
> kernel: Kernel panic: kernel access of bad area pc c00cf34c lr c00cfa88
>address 24 tsk pmud/299
>

Can you lookup in System.map where NIP and LR are ? And eventually a bit
of the backtrace ? I'll check on my side if I see something.

Ben.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
  2000-09-16 18:05 ` recent fixes in devel tree Benjamin Herrenschmidt
@ 2000-09-16 21:51   ` Claus Enneper
  0 siblings, 0 replies; 7+ messages in thread
From: Claus Enneper @ 2000-09-16 21:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

Hi Benjamin!

 >: > kernel: Call backtrace:
 >: > kernel: C0179E2C C00CFA88 C00F2BCC C0185B00 C00F2FAC C003C310 C000392
 >: > kernel: 0FF8D9A4 1000302C 100015D4 0FF0B69C 00000000
 >: > kernel: Kernel panic: kernel access of bad area pc c00cf34c lr c00cfa88
 >: >address 24 tsk pmud/299
 >: >
 >: 
 >: Can you lookup in System.map where NIP and LR are ? And eventually a bit
 >: of the backtrace ? I'll check on my side if I see something.

I've built it again to be sure and it did not change the situation.

but there's neither 'pc c00cf34c' nor 'lr c00cfa88' in System.map
c00cf060 t pmac_ide_dmaproc
c00cf218 t idepmac_sleep_device
c00cf334 t idepmac_wake_device
c00cf400 t idepmac_sleep_interface
c00cf4e4 t idepmac_wake_interface
c00cf728 t idepmac_notify_sleep
c00cfab0 T ide_dma_intr
c00cfb8c T ide_build_dmatable

?, is this a thinkable setting?
The System.map does not match the kernel.. 
 
it seems to be better to rsync again and hope.
is linuxppc.org a reliable server or should it be ppc.samba.org?  



		Claus

[-- Attachment #2: Type: application/pgp-signature, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
  2000-09-16 14:25   ` Dan Malek
@ 2000-09-17 16:01     ` Holger Bettag
  2000-09-19  3:06       ` Dan Malek
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Bettag @ 2000-09-17 16:01 UTC (permalink / raw)
  To: linuxppc-dev


Dan Malek <dan@mvista.com> writes:

>
> Holger Bettag wrote:
>
> > P.S.: BTW, is there a general consensus wether or not AltiVec
> >       enhancements in the kernel would be a good thing.....
>
>
> What kind of enhancements?  I do a fair amount of audio/video
> processing on my G4 with the current gcc/Altivec tools.  I found only
> one minor bug in the exception handler that I fixed a while back.
>
> We have been discussing some of the vector load/store/touch as both
> enhancements to the kernel and glibc.  It is a little more challenging
> in the kernel because we would have to then keep some of the vector
> context around for the kernel, but in a very constrained case we could
> probably get some speed ups with that.
>
I know too little about the kernel to suggest specific tunings. I
thought along the lines you mention, inserting a 'data stream touch'
here and there for improved bus utilization, occasional use of AltiVec
code where it tends to speed things up a lot, like doing checksums for
software RAIDs or other fast I/O channels. Maybe 'generic' video
acceleration of the OpenFirmware framebuffer device, or something like
that. (Maybe I/O en-/decryption, if such functionality ever ends up in
the kernel.)

I mainly wanted to know wether there are efforts underway to make the
use of AltiVec in the kernel a no-brainer (like: "just pass -fvec and
use the AltiVec-'macros'!"), or wether it will be only for wizards at
the bleeding edge.

  Holger

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: recent fixes in devel tree
  2000-09-17 16:01     ` Holger Bettag
@ 2000-09-19  3:06       ` Dan Malek
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Malek @ 2000-09-19  3:06 UTC (permalink / raw)
  To: Holger Bettag; +Cc: linuxppc-dev


Holger Bettag wrote:

> I know too little about the kernel to suggest specific tunings.

Suggestions always make people think......

> .... I
> thought along the lines you mention, inserting a 'data stream touch'
> here and there for improved bus utilization, occasional use of AltiVec
> code where it tends to speed things up a lot,

The data stream touch is maybe useful in some very specific places.
I have written user level performance tests to see where it is
beneficial, and you have to be careful.  Yes, it can speed things
up when all of the bits line up just right, but you can also screw up
with it as well :-).  Oh, I use it, but it isn't a quick and generic
solution.

> I mainly wanted to know wether there are efforts underway...
> .... or wether it will be only for wizards at
> the bleeding edge.

Oh, there are a few.  Bleeding edge is a good description, but
wizard is more like 'wizzing' (perhaps into the wind :-) in some
cases....

I'm working on several user applications (audio, video, MPEG, etc)
that utilize Altivec, so I'm getting a pretty good idea how to tune
data streams and caches.  Some of this will probably find it's way
into the C library (and has already in some of the MPEG libraries)
for performance enhancement.  The challenge in the kernel is making
the Altivec context available for the kernel.  The Altivec is faster,
but it comes at a higher set-up cost that you also have to consider.
We'll get some in there.  It is definitely hardware that will get
utilized more and more.


	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-09-19  3:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20000916152035.A517@olis.north.de>
2000-09-16 18:05 ` recent fixes in devel tree Benjamin Herrenschmidt
2000-09-16 21:51   ` Claus Enneper
2000-09-16  5:04 Paul Mackerras
2000-09-16 11:16 ` Holger Bettag
2000-09-16 14:25   ` Dan Malek
2000-09-17 16:01     ` Holger Bettag
2000-09-19  3:06       ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).