linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* kernel oops due to unaligned access with lswi
@ 2003-11-15 21:04 Olaf Hering
  2003-11-15 22:24 ` Olaf Hering
  2003-11-16  0:40 ` Paul Mackerras
  0 siblings, 2 replies; 24+ messages in thread
From: Olaf Hering @ 2003-11-15 21:04 UTC (permalink / raw)
  To: Alan Modra; +Cc: linuxppc-dev


Alan,

2.6 has a function parse_header(), its part of gunzip and used for
initramfs as example. I got an oops, exception 600, sig 7. I think that
means unaligned access.

Why does the compiler optimize this one? I see it with 2.95, 3.2, 3.3
and 3.4 with -O1 and better, not with -O0.

gcc -msoft-float -mmultiple -mstring -O2 -Wall -Wstrict-prototypes \
-Wno-trigraphs -Wno-uninitialized -version -fno-strict-aliasing \
-fno-common -ffixed-r2 -fomit-frame-pointer -o parse_header \
 -c parse_header.c -v --save-temps

	lswi 9,31,8
	stswi 9,28,8

s = r31. How can gcc be sure that s aligned?
ppc64 will call memcpy (tested with gcc3.2.3).

	.file	"parse_header.c"
	.section	.init.text,"ax",@progbits
	.align 2
	.type	parse_header,@function
parse_header:
	stwu 1,-112(1)
	mflr 0
	stmw 27,92(1)
	stw 0,116(1)
	li 0,0
	stb 0,64(1)
	li 30,0
	addi 31,3,6
	addi 28,1,56
	addi 27,1,8
.L6:
	lswi 9,31,8
	stswi 9,28,8
	slwi 29,30,2
	mr 3,28
	li 4,0
	li 5,16
	bl simple_strtoul
	stwx 3,29,27
	addi 30,30,1
	addi 31,31,8
	cmpwi 0,30,11
	ble+ 0,.L6
	lwz 0,116(1)
	mtlr 0
	lmw 27,92(1)
	addi 1,1,112
	blr
.Lfe1:
	.size	parse_header,.Lfe1-parse_header
	.section	".text"
	.align 2
	.globl main
	.type	main,@function
main:
	stwu 1,-16(1)
	mflr 0
	stw 0,20(1)
	lis 3,0x1
	ori 3,3,57920
	bl malloc
	bl parse_header
	li 3,0
	lwz 0,20(1)
	mtlr 0
	addi 1,1,16
	blr
.Lfe2:
	.size	main,.Lfe2-main
	.ident	"GCC: (GNU) 3.2.3 (SuSE Linux)"


#include <stdlib.h>
typedef unsigned int __kernel_size_t;
extern void *memcpy(void *, const void *, __kernel_size_t);
extern unsigned long simple_strtoul(const char *, char **, unsigned int);

static void __attribute__ ((__section__(".init.text"))) parse_header(char *s)
{
	unsigned long parsed[12];
	char buf[9];
	int i;

	buf[8] = '\0';
	for (i = 0, s += 6; i < 12; i++, s += 8) {
		memcpy(buf, s, 8);
		parsed[i] = simple_strtoul(buf, ((void *) 0), 16);
	}
}

int
main(void)
{
	char *s;
	s = malloc(123456);
	parse_header(s);
	return 0;
}
--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 21:04 kernel oops due to unaligned access with lswi Olaf Hering
@ 2003-11-15 22:24 ` Olaf Hering
  2003-11-15 22:30   ` David Edelsohn
  2003-11-16  0:40 ` Paul Mackerras
  1 sibling, 1 reply; 24+ messages in thread
From: Olaf Hering @ 2003-11-15 22:24 UTC (permalink / raw)
  To: Alan Modra; +Cc: linuxppc-dev


 On Sat, Nov 15, Olaf Hering wrote:

>
> Alan,
>
> 2.6 has a function parse_header(), its part of gunzip and used for
> initramfs as example. I got an oops, exception 600, sig 7. I think that
> means unaligned access.

This might be a useable workaround.

--- ../linuxppc-2.5_2.6.0-test9-bk.orig/init/initramfs.c        2003-10-18 17:25:50.000000000 +0200
+++ init/initramfs.c    2003-11-15 23:09:57.000000000 +0100
@@ -100,11 +100,11 @@ static void __init parse_header(char *s)
 {
        unsigned long parsed[12];
        char buf[9];
-       int i;
+       int i, j = 1;

        buf[8] = '\0';
        for (i = 0, s += 6; i < 12; i++, s += 8) {
-               memcpy(buf, s, 8);
+               memcpy(buf, s, 7 + j); /* s might be unaligned, gcc will optimized the call to lswi on ppc */
                parsed[i] = simple_strtoul(buf, NULL, 16);
        }
        ino = parsed[0];


--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 22:24 ` Olaf Hering
@ 2003-11-15 22:30   ` David Edelsohn
  2003-11-15 22:37     ` Olaf Hering
  2003-11-15 22:43     ` Olaf Hering
  0 siblings, 2 replies; 24+ messages in thread
From: David Edelsohn @ 2003-11-15 22:30 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Alan Modra, linuxppc-dev


>>>>> Olaf Hering writes:

Olaf> 2.6 has a function parse_header(), its part of gunzip and used for
Olaf> initramfs as example. I got an oops, exception 600, sig 7. I think that
Olaf> means unaligned access.

Olaf> +               memcpy(buf, s, 7 + j); /* s might be unaligned, gcc will optimized the call to lswi on ppc */

	lswi specifically accepts unaligned addresses.

david

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 22:30   ` David Edelsohn
@ 2003-11-15 22:37     ` Olaf Hering
  2003-11-15 22:43     ` Olaf Hering
  1 sibling, 0 replies; 24+ messages in thread
From: Olaf Hering @ 2003-11-15 22:37 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Alan Modra, linuxppc-dev


 On Sat, Nov 15, David Edelsohn wrote:

> >>>>> Olaf Hering writes:
>
> Olaf> 2.6 has a function parse_header(), its part of gunzip and used for
> Olaf> initramfs as example. I got an oops, exception 600, sig 7. I think that
> Olaf> means unaligned access.
>
> Olaf> +               memcpy(buf, s, 7 + j); /* s might be unaligned, gcc will optimized the call to lswi on ppc */
>
> 	lswi specifically accepts unaligned addresses.

Thanks. Then the bug is somewhere else, because it crashes like this if
I use '8' instead of '7 + j' (no memcpy call):

Calibrating delay loop... 89.49 BogoMIPS
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Serial port locked ON by debugger !
vector: 600 at pc = c025bf14, lr = c025bf20
msr = 9030, sp = c02598c0 [c0259810]
dar = c043cffa, dsisr = a53f
current = c02191c8, pid = 0, comm = swapper
mon> di c025bf14
c025bf14  7d3f44aa      lswi    r9,r31,8
c025bf18  7d3c45aa      stswi   r9,r28,8
c025bf1c  4be62ee1      bl      0xc00bedfc
c025bf20  2c1e000b      cmpwi   r30,11
c025bf24  3bff0008      addi    r31,r31,8
c025bf28  7c7dd92e      stwx    r3,r29,r27
c025bf2c  4081ffd4      ble     0xc025bf00
c025bf30  8001002c      lwz     r0,44(r1)
c025bf34  81210030      lwz     r9,48(r1)
c025bf38  5400a016      rlwinm  r0,r0,20,0,11
c025bf3c  81610008      lwz     r11,8(r1)
c025bf40  7c004b78      or      r0,r0,r9
c025bf44  5409a32e      rlwinm  r9,r0,20,12,23
c025bf48  540a063e      clrlwi  r10,r0,24
c025bf4c  7d4a4b78      or      r10,r10,r9
c025bf50  54006016      rlwinm  r0,r0,12,0,11
mon> r
R00 = 00000008   R01 = c02598c0   R02 = c02191c8   R03 = c02598f8
R04 = 00000000   R05 = 00000010   R06 = c043cfb1   R07 = 00000001
R08 = 00000000   R09 = c02224f4   R10 = ffffffd0   R11 = 00000000
R12 = 00014fac   R13 = deadbeef   R14 = deadbeef   R15 = deadbeef
R16 = deadbeef   R17 = deadbeef   R18 = deadbeef   R19 = deadbeef
R20 = 0000003f   R21 = 000001ff   R22 = c47f1408   R23 = c032e008
R24 = 00000006   R25 = 00000009   R26 = 0000000f   R27 = c02598c8
R28 = c02598f8   R29 = 00000024   R30 = 0000000a   R31 = c043cffa
pc  = c025bf14   msr = 00009030   lr  = c025bf20   cr  = 95000053
ctr = 00000000   xer = c000be6f   trap =  600
mon>


pmac 7200/90 with 601 cpu.


--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 22:30   ` David Edelsohn
  2003-11-15 22:37     ` Olaf Hering
@ 2003-11-15 22:43     ` Olaf Hering
  2003-11-15 22:59       ` David Edelsohn
  1 sibling, 1 reply; 24+ messages in thread
From: Olaf Hering @ 2003-11-15 22:43 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Alan Modra, linuxppc-dev


 On Sat, Nov 15, David Edelsohn wrote:

> >>>>> Olaf Hering writes:
>
> Olaf> 2.6 has a function parse_header(), its part of gunzip and used for
> Olaf> initramfs as example. I got an oops, exception 600, sig 7. I think that
> Olaf> means unaligned access.
>
> Olaf> +               memcpy(buf, s, 7 + j); /* s might be unaligned, gcc will optimized the call to lswi on ppc */
>
> 	lswi specifically accepts unaligned addresses.

The asm diff looks like that:

--- initramfs-8.s       2003-11-15 23:41:12.000000000 +0100
+++ initramfs-7+j.s     2003-11-15 23:41:36.000000000 +0100
@@ -165,16 +165,18 @@
        addi 28,1,56
        addi 27,1,8
 .L85:
+       mr 4,31
+       li 5,8
+       mr 3,28
        slwi 29,30,2
+       bl memcpy
+       addi 30,30,1
        mr 3,28
        li 4,0
        li 5,16
-       addi 30,30,1
-       lswi 9,31,8
-       stswi 9,28,8
+       addi 31,31,8
        bl simple_strtoul
        cmpwi 0,30,11
-       addi 31,31,8
        stwx 3,29,27
        ble+ 0,.L85
        lwz 0,44(1)


--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 22:43     ` Olaf Hering
@ 2003-11-15 22:59       ` David Edelsohn
  2003-11-16 10:17         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 24+ messages in thread
From: David Edelsohn @ 2003-11-15 22:59 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Alan Modra, linuxppc-dev


	I didn't mean that lswi cannot take an alignment exception on some
PPC implementations, but that lswi is suppose to be able to handle block
loads from addresses with arbitrary alignment.

David


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 21:04 kernel oops due to unaligned access with lswi Olaf Hering
  2003-11-15 22:24 ` Olaf Hering
@ 2003-11-16  0:40 ` Paul Mackerras
  2003-11-16  1:45   ` Olaf Hering
  2003-11-16  5:07   ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 24+ messages in thread
From: Paul Mackerras @ 2003-11-16  0:40 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Alan Modra, linuxppc-dev


Olaf,

> 	lswi 9,31,8
> 	stswi 9,28,8
>
> s = r31. How can gcc be sure that s aligned?

What machine is this?  I looked at the manuals for 750, 7450, POWER4
and they all handle unaligned string ops in hardware.  The alignment
handler doesn't handle string ops, I believe, although it could.  And
which arch (ppc32 or ppc64)?

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16  0:40 ` Paul Mackerras
@ 2003-11-16  1:45   ` Olaf Hering
  2003-11-16 16:49     ` Olaf Hering
  2003-11-16  5:07   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 24+ messages in thread
From: Olaf Hering @ 2003-11-16  1:45 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Alan Modra, linuxppc-dev


 On Sun, Nov 16, Paul Mackerras wrote:

> Olaf,
>
> > 	lswi 9,31,8
> > 	stswi 9,28,8
> >
> > s = r31. How can gcc be sure that s aligned?
>
> What machine is this?  I looked at the manuals for 750, 7450, POWER4
> and they all handle unaligned string ops in hardware.  The alignment
> handler doesn't handle string ops, I believe, although it could.  And
> which arch (ppc32 or ppc64)?

Its a 7200/90 with 601 cpu. And I'm afraid, the zlib.c needs also
tweaking. I think the gcc built-in memcpy is used in the bootloader.
Same issue, 'DEFAULT CATCH!, code=FFF00600' without this change (adds
also zlib debugging, but doesnt work for prepboot right now, if enabled).


--- ../linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/lib/zlib.c        2003-09-12 18:26:51.000000000 +0200
+++ arch/ppc/boot/lib/zlib.c    2003-11-16 02:43:22.000000000 +0100
@@ -1,3 +1,5 @@
+#define DEBUG_ZLIB 1
+#define verbose 1
 /*
  * This file is derived from various .h and .c files from the zlib-0.95
  * distribution by Jean-loup Gailly and Mark Adler, with some additions
@@ -85,11 +87,11 @@ extern char *z_errmsg[]; /* indexed by 1

 /* Diagnostic functions */
 #ifdef DEBUG_ZLIB
-#  include <stdio.h>
+#  include <nonstdio.h>
 #  ifndef verbose
 #    define verbose 0
 #  endif
-#  define Assert(cond,msg) {if(!(cond)) z_error(msg);}
+#  define Assert(cond,msg) {if(!(cond)) printf(msg);}
 #  define Trace(x) fprintf x
 #  define Tracev(x) {if (verbose) fprintf x ;}
 #  define Tracevv(x) {if (verbose>1) fprintf x ;}
@@ -884,7 +886,14 @@ local int inflate_blocks(
       t = s->sub.left;
       if (t > n) t = n;
       if (t > m) t = m;
+#if 0
       zmemcpy(q, p, t);
+#else
+      {
+       int i;
+       for(i=0;i <t;i++)q[i]=p[i];
+      }
+#endif
       p += t;  n -= t;
       q += t;  m -= t;
       if ((s->sub.left -= t) != 0)
@@ -1230,7 +1239,7 @@ local uInt cpdext[] = { /* Extra bits fo
 #define N_MAX 288       /* maximum number of codes in any set */

 #ifdef DEBUG_ZLIB
-  uInt inflate_hufts;
+  local uInt inflate_hufts;
 #endif

 local int huft_build(


--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16  0:40 ` Paul Mackerras
  2003-11-16  1:45   ` Olaf Hering
@ 2003-11-16  5:07   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 24+ messages in thread
From: Benjamin Herrenschmidt @ 2003-11-16  5:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Olaf Hering, Alan Modra, linuxppc-dev list


On Sun, 2003-11-16 at 11:40, Paul Mackerras wrote:
> Olaf,
>
> > 	lswi 9,31,8
> > 	stswi 9,28,8
> >
> > s = r31. How can gcc be sure that s aligned?
>
> What machine is this?  I looked at the manuals for 750, 7450, POWER4
> and they all handle unaligned string ops in hardware.  The alignment
> handler doesn't handle string ops, I believe, although it could.  And
> which arch (ppc32 or ppc64)?

One of your old friends actually:

olaf> pmac 7200/90 with 601 cpu.

:)

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-15 22:59       ` David Edelsohn
@ 2003-11-16 10:17         ` Benjamin Herrenschmidt
  2003-11-16 17:49           ` Kumar Gala
  2003-11-16 23:04           ` David Edelsohn
  0 siblings, 2 replies; 24+ messages in thread
From: Benjamin Herrenschmidt @ 2003-11-16 10:17 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Olaf Hering, Alan Modra, linuxppc-dev list


On Sun, 2003-11-16 at 09:59, David Edelsohn wrote:
> 	I didn't mean that lswi cannot take an alignment exception on some
> PPC implementations, but that lswi is suppose to be able to handle block
> loads from addresses with arbitrary alignment

I remember beeing regulary told (I think by Apple while I was still
doing MacOS hacking) that those string instructions were evil,
deprecated, and should be avoided as they weren't peforming better
than the equivalent set of load/store instructions... Is this
still true ? In which case we may want to avoid generating them
from gcc..

Also, if the 601 effectively gets alignement exceptions on these,
it's quite bad to have them implicitely generated by gcc for memcpy's
since our OFs seem to not implement the alignement handler for them,
thus breaking our boot wrappers.

Finally, the pem32b at least seem to be clear about not encouraging
to use these especially on non-aligned accesses. It looks like a
weird optimisation to do for memcpy...

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16  1:45   ` Olaf Hering
@ 2003-11-16 16:49     ` Olaf Hering
  0 siblings, 0 replies; 24+ messages in thread
From: Olaf Hering @ 2003-11-16 16:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Alan Modra, linuxppc-dev


 On Sun, Nov 16, Olaf Hering wrote:

>
>  On Sun, Nov 16, Paul Mackerras wrote:
>
> > Olaf,
> >
> > > 	lswi 9,31,8
> > > 	stswi 9,28,8
> > >
> > > s = r31. How can gcc be sure that s aligned?
> >
> > What machine is this?  I looked at the manuals for 750, 7450, POWER4
> > and they all handle unaligned string ops in hardware.  The alignment
> > handler doesn't handle string ops, I believe, although it could.  And
> > which arch (ppc32 or ppc64)?
>
> Its a 7200/90 with 601 cpu. And I'm afraid, the zlib.c needs also
> tweaking. I think the gcc built-in memcpy is used in the bootloader.
> Same issue, 'DEFAULT CATCH!, code=FFF00600' without this change (adds
> also zlib debugging, but doesnt work for prepboot right now, if enabled).

This patch allows zlib debugging, defines a dummy printf for prep,
prints all 4 bytes of a pointer in the coff bootloader.


diff -x bin -x ash -x klibc-0.81 -purNX /home/olaf/kernel/kernel_exclude.txt linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/common/misc-common.c linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/common/misc-common.c
--- linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/common/misc-common.c	2003-09-12 19:39:38.000000000 +0200
+++ linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/common/misc-common.c	2003-11-16 17:42:36.000000000 +0100
@@ -67,6 +67,8 @@ extern unsigned char serial_getc(unsigne
 extern void serial_putc(unsigned long com_port, unsigned char c);
 #endif

+int printf(const char *fmt, ...) { return 0; }
+
 void pause(void)
 {
 	puts("pause\n");
diff -x bin -x ash -x klibc-0.81 -purNX /home/olaf/kernel/kernel_exclude.txt linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/lib/zlib.c linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/lib/zlib.c
--- linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/lib/zlib.c	2003-09-12 18:26:51.000000000 +0200
+++ linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/lib/zlib.c	2003-11-16 17:35:40.000000000 +0100
@@ -1,3 +1,7 @@
+#if 0
+#define DEBUG_ZLIB 1
+#define verbose 1
+#endif
 /*
  * This file is derived from various .h and .c files from the zlib-0.95
  * distribution by Jean-loup Gailly and Mark Adler, with some additions
@@ -85,16 +89,16 @@ extern char *z_errmsg[]; /* indexed by 1

 /* Diagnostic functions */
 #ifdef DEBUG_ZLIB
-#  include <stdio.h>
+#  include <nonstdio.h>
 #  ifndef verbose
 #    define verbose 0
 #  endif
-#  define Assert(cond,msg) {if(!(cond)) z_error(msg);}
-#  define Trace(x) fprintf x
-#  define Tracev(x) {if (verbose) fprintf x ;}
-#  define Tracevv(x) {if (verbose>1) fprintf x ;}
-#  define Tracec(c,x) {if (verbose && (c)) fprintf x ;}
-#  define Tracecv(c,x) {if (verbose>1 && (c)) fprintf x ;}
+#  define Assert(cond,msg) {if(!(cond)) printf(msg);}
+#  define Trace(x) printf x
+#  define Tracev(x) {if (verbose) printf x ;}
+#  define Tracevv(x) {if (verbose>1) printf x ;}
+#  define Tracec(c,x) {if (verbose && (c)) printf x ;}
+#  define Tracecv(c,x) {if (verbose>1 && (c)) printf x ;}
 #else
 #  define Assert(cond,msg)
 #  define Trace(x)
@@ -311,7 +315,7 @@ int inflateReset(
   z->msg = Z_NULL;
   z->state->mode = z->state->nowrap ? BLOCKS : METHOD;
   inflate_blocks_reset(z->state->blocks, z, &c);
-  Trace((stderr, "inflate: reset\n"));
+  Trace(("inflate: reset\n"));
   return Z_OK;
 }

@@ -328,7 +332,7 @@ int inflateEnd(
     inflate_blocks_free(z->state->blocks, z, &c);
   ZFREE(z, z->state, sizeof(struct internal_state));
   z->state = Z_NULL;
-  Trace((stderr, "inflate: end\n"));
+  Trace(("inflate: end\n"));
   return Z_OK;
 }

@@ -372,7 +376,7 @@ int inflateInit2(
     inflateEnd(z);
     return Z_MEM_ERROR;
   }
-  Trace((stderr, "inflate: allocated\n"));
+  Trace(("inflate: allocated\n"));

   /* reset state */
   inflateReset(z);
@@ -437,7 +441,7 @@ int inflate(
         z->state->sub.marker = 5;       /* can't try inflateSync */
         break;
       }
-      Trace((stderr, "inflate: zlib header ok\n"));
+      Trace(("inflate: zlib header ok\n"));
       z->state->mode = BLOCKS;
     case BLOCKS:
       r = inflate_blocks(z->state->blocks, z, r);
@@ -482,7 +486,7 @@ int inflate(
         z->state->sub.marker = 5;       /* can't try inflateSync */
         break;
       }
-      Trace((stderr, "inflate: zlib check ok\n"));
+      Trace(("inflate: zlib check ok\n"));
       z->state->mode = DONE;
     case DONE:
       return Z_STREAM_END;
@@ -766,7 +770,7 @@ local void inflate_blocks_reset(
   s->read = s->write = s->window;
   if (s->checkfn != Z_NULL)
     s->check = (*s->checkfn)(0L, Z_NULL, 0);
-  Trace((stderr, "inflate:   blocks reset\n"));
+  Trace(("inflate:   blocks reset\n"));
 }


@@ -789,7 +793,7 @@ local inflate_blocks_statef *inflate_blo
   s->end = s->window + w;
   s->checkfn = c;
   s->mode = TYPE;
-  Trace((stderr, "inflate:   blocks allocated\n"));
+  Trace(("inflate:   blocks allocated\n"));
   inflate_blocks_reset(s, z, &s->check);
   return s;
 }
@@ -822,7 +826,7 @@ local int inflate_blocks(
       switch (t >> 1)
       {
         case 0:                         /* stored */
-          Trace((stderr, "inflate:     stored block%s\n",
+          Trace(("inflate:     stored block%s\n",
                  s->last ? " (last)" : ""));
           DUMPBITS(3)
           t = k & 7;                    /* go to byte boundary */
@@ -830,7 +834,7 @@ local int inflate_blocks(
           s->mode = LENS;               /* get length of stored block */
           break;
         case 1:                         /* fixed */
-          Trace((stderr, "inflate:     fixed codes block%s\n",
+          Trace(("inflate:     fixed codes block%s\n",
                  s->last ? " (last)" : ""));
           {
             uInt bl, bd;
@@ -850,7 +854,7 @@ local int inflate_blocks(
           s->mode = CODES;
           break;
         case 2:                         /* dynamic */
-          Trace((stderr, "inflate:     dynamic codes block%s\n",
+          Trace(("inflate:     dynamic codes block%s\n",
                  s->last ? " (last)" : ""));
           DUMPBITS(3)
           s->mode = TABLE;
@@ -874,7 +878,7 @@ local int inflate_blocks(
       }
       s->sub.left = (uInt)b & 0xffff;
       b = k = 0;                      /* dump bits */
-      Tracev((stderr, "inflate:       stored length %u\n", s->sub.left));
+      Tracev(("inflate:       stored length %u\n", s->sub.left));
       s->mode = s->sub.left ? STORED : TYPE;
       break;
     case STORED:
@@ -884,12 +888,16 @@ local int inflate_blocks(
       t = s->sub.left;
       if (t > n) t = n;
       if (t > m) t = m;
+#if 0
       zmemcpy(q, p, t);
+#else
+      { int i; for(i=0;i <t;i++)q[i]=p[i]; }
+#endif
       p += t;  n -= t;
       q += t;  m -= t;
       if ((s->sub.left -= t) != 0)
         break;
-      Tracev((stderr, "inflate:       stored end, %lu total out\n",
+      Tracev(("inflate:       stored end, %lu total out\n",
               z->total_out + (q >= s->read ? q - s->read :
               (s->end - s->read) + (q - s->window))));
       s->mode = s->last ? DRY : TYPE;
@@ -917,7 +925,7 @@ local int inflate_blocks(
       s->sub.trees.nblens = t;
       DUMPBITS(14)
       s->sub.trees.index = 0;
-      Tracev((stderr, "inflate:       table sizes ok\n"));
+      Tracev(("inflate:       table sizes ok\n"));
       s->mode = BTREE;
     case BTREE:
       while (s->sub.trees.index < 4 + (s->sub.trees.table >> 10))
@@ -939,7 +947,7 @@ local int inflate_blocks(
         LEAVE
       }
       s->sub.trees.index = 0;
-      Tracev((stderr, "inflate:       bits tree ok\n"));
+      Tracev(("inflate:       bits tree ok\n"));
       s->mode = DTREE;
     case DTREE:
       while (t = s->sub.trees.table,
@@ -1002,7 +1010,7 @@ local int inflate_blocks(
           r = t;
           LEAVE
         }
-        Tracev((stderr, "inflate:       trees ok\n"));
+        Tracev(("inflate:       trees ok\n"));
         if ((c = inflate_codes_new(bl, bd, tl, td, z)) == Z_NULL)
         {
           inflate_trees_free(td, z);
@@ -1025,7 +1033,7 @@ local int inflate_blocks(
       inflate_trees_free(s->sub.decode.td, z);
       inflate_trees_free(s->sub.decode.tl, z);
       LOAD
-      Tracev((stderr, "inflate:       codes end, %lu total out\n",
+      Tracev(("inflate:       codes end, %lu total out\n",
               z->total_out + (q >= s->read ? q - s->read :
               (s->end - s->read) + (q - s->window))));
       if (!s->last)
@@ -1068,7 +1076,7 @@ local int inflate_blocks_free(
   inflate_blocks_reset(s, z, c);
   ZFREE(z, s->window, s->end - s->window);
   ZFREE(z, s, sizeof(struct inflate_blocks_state));
-  Trace((stderr, "inflate:   blocks freed\n"));
+  Trace(("inflate:   blocks freed\n"));
   return Z_OK;
 }

@@ -1230,7 +1238,7 @@ local uInt cpdext[] = { /* Extra bits fo
 #define N_MAX 288       /* maximum number of codes in any set */

 #ifdef DEBUG_ZLIB
-  uInt inflate_hufts;
+  local uInt inflate_hufts;
 #endif

 local int huft_build(
@@ -1687,7 +1695,7 @@ local inflate_codes_statef *inflate_code
     c->dbits = (Byte)bd;
     c->ltree = tl;
     c->dtree = td;
-    Tracev((stderr, "inflate:       codes new\n"));
+    Tracev(("inflate:       codes new\n"));
   }
   return c;
 }
@@ -1743,7 +1751,7 @@ local int inflate_codes(
       if (e == 0)               /* literal */
       {
         c->sub.lit = t->base;
-        Tracevv((stderr, t->base >= 0x20 && t->base < 0x7f ?
+        Tracevv((t->base >= 0x20 && t->base < 0x7f ?
                  "inflate:         literal '%c'\n" :
                  "inflate:         literal 0x%02x\n", t->base));
         c->mode = LIT;
@@ -1764,7 +1772,7 @@ local int inflate_codes(
       }
       if (e & 32)               /* end of block */
       {
-        Tracevv((stderr, "inflate:         end of block\n"));
+        Tracevv(("inflate:         end of block\n"));
         c->mode = WASH;
         break;
       }
@@ -1779,7 +1787,7 @@ local int inflate_codes(
       DUMPBITS(j)
       c->sub.code.need = c->dbits;
       c->sub.code.tree = c->dtree;
-      Tracevv((stderr, "inflate:         length %u\n", c->len));
+      Tracevv(("inflate:         length %u\n", c->len));
       c->mode = DIST;
     case DIST:          /* i: get distance next */
       j = c->sub.code.need;
@@ -1809,7 +1817,7 @@ local int inflate_codes(
       NEEDBITS(j)
       c->sub.copy.dist += (uInt)b & inflate_mask[j];
       DUMPBITS(j)
-      Tracevv((stderr, "inflate:         distance %u\n", c->sub.copy.dist));
+      Tracevv(("inflate:         distance %u\n", c->sub.copy.dist));
       c->mode = COPY;
     case COPY:          /* o: copying bytes in window, waiting for space */
 #ifndef __TURBOC__ /* Turbo C bug for following expression */
@@ -1860,7 +1868,7 @@ local void inflate_codes_free(
 )
 {
   ZFREE(z, c, sizeof(struct inflate_codes_state));
-  Tracev((stderr, "inflate:       codes free\n"));
+  Tracev(("inflate:       codes free\n"));
 }

 /*+++++*/
@@ -1995,7 +2003,7 @@ local int inflate_fast(
     if ((e = (t = tl + ((uInt)b & ml))->exop) == 0)
     {
       DUMPBITS(t->bits)
-      Tracevv((stderr, t->base >= 0x20 && t->base < 0x7f ?
+      Tracevv((t->base >= 0x20 && t->base < 0x7f ?
                 "inflate:         * literal '%c'\n" :
                 "inflate:         * literal 0x%02x\n", t->base));
       *q++ = (Byte)t->base;
@@ -2010,7 +2018,7 @@ local int inflate_fast(
         e &= 15;
         c = t->base + ((uInt)b & inflate_mask[e]);
         DUMPBITS(e)
-        Tracevv((stderr, "inflate:         * length %u\n", c));
+        Tracevv(("inflate:         * length %u\n", c));

         /* decode distance base of block to copy */
         GRABBITS(15);           /* max bits for distance code */
@@ -2024,7 +2032,7 @@ local int inflate_fast(
             GRABBITS(e)         /* get extra bits (up to 13) */
             d = t->base + ((uInt)b & inflate_mask[e]);
             DUMPBITS(e)
-            Tracevv((stderr, "inflate:         * distance %u\n", d));
+            Tracevv(("inflate:         * distance %u\n", d));

             /* do the copy */
             m -= c;
@@ -2069,7 +2077,7 @@ local int inflate_fast(
         if ((e = (t = t->next + ((uInt)b & inflate_mask[e]))->exop) == 0)
         {
           DUMPBITS(t->bits)
-          Tracevv((stderr, t->base >= 0x20 && t->base < 0x7f ?
+          Tracevv((t->base >= 0x20 && t->base < 0x7f ?
                     "inflate:         * literal '%c'\n" :
                     "inflate:         * literal 0x%02x\n", t->base));
           *q++ = (Byte)t->base;
@@ -2079,7 +2087,7 @@ local int inflate_fast(
       }
       else if (e & 32)
       {
-        Tracevv((stderr, "inflate:         * end of block\n"));
+        Tracevv(("inflate:         * end of block\n"));
         UNGRAB
         UPDATE
         return Z_STREAM_END;
diff -x bin -x ash -x klibc-0.81 -purNX /home/olaf/kernel/kernel_exclude.txt linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/openfirmware/coffmain.c linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/openfirmware/coffmain.c
--- linuxppc-2.5_2.6.0-test9-bk.orig/arch/ppc/boot/openfirmware/coffmain.c	2003-10-14 13:33:50.000000000 +0200
+++ linuxppc-2.5_2.6.0-test9-bk/arch/ppc/boot/openfirmware/coffmain.c	2003-11-16 17:22:48.000000000 +0100
@@ -60,7 +60,7 @@ void boot(int a1, int a2, void *prom)
 	a1 = initrd_start;
 	a2 = initrd_size;
 	claim(initrd_start, ram_end - initrd_start, 0);
-	printf("initial ramdisk moving 0x%x <- 0x%p (%x bytes)\n\r",
+	printf("initial ramdisk moving 0x%08x <- 0x%p (%08x bytes)\n\r",
 	       initrd_start, (char *)(&__ramdisk_begin), initrd_size);
 	memcpy((char *)initrd_start, (char *)(&__ramdisk_begin), initrd_size);
 	prog_size = initrd_start - prog_start;

--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 10:17         ` Benjamin Herrenschmidt
@ 2003-11-16 17:49           ` Kumar Gala
  2003-11-16 22:19             ` Alan Modra
  2003-11-16 23:12             ` Benjamin Herrenschmidt
  2003-11-16 23:04           ` David Edelsohn
  1 sibling, 2 replies; 24+ messages in thread
From: Kumar Gala @ 2003-11-16 17:49 UTC (permalink / raw)
  To: benh; +Cc: Olaf Hering, linuxppc-dev list, Alan Modra, David Edelsohn


If Ben's comments are correct simply removing -mstring as an option
passed to the build should get the desired behavior.

- kumar

On Nov 16, 2003, at 4:17 AM, Benjamin Herrenschmidt wrote:

>
> On Sun, 2003-11-16 at 09:59, David Edelsohn wrote:
>> 	I didn't mean that lswi cannot take an alignment exception on some
>> PPC implementations, but that lswi is suppose to be able to handle
>> block
>> loads from addresses with arbitrary alignment
>
> I remember beeing regulary told (I think by Apple while I was still
> doing MacOS hacking) that those string instructions were evil,
> deprecated, and should be avoided as they weren't peforming better
> than the equivalent set of load/store instructions... Is this
> still true ? In which case we may want to avoid generating them
> from gcc..
>
> Also, if the 601 effectively gets alignement exceptions on these,
> it's quite bad to have them implicitely generated by gcc for memcpy's
> since our OFs seem to not implement the alignement handler for them,
> thus breaking our boot wrappers.
>
> Finally, the pem32b at least seem to be clear about not encouraging
> to use these especially on non-aligned accesses. It looks like a
> weird optimisation to do for memcpy...
>
> Ben.
>
>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 17:49           ` Kumar Gala
@ 2003-11-16 22:19             ` Alan Modra
  2003-11-16 22:45               ` Jon Masters
  2003-11-17  0:50               ` Paul Mackerras
  2003-11-16 23:12             ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 24+ messages in thread
From: Alan Modra @ 2003-11-16 22:19 UTC (permalink / raw)
  To: Kumar Gala; +Cc: benh, Olaf Hering, linuxppc-dev list, David Edelsohn


On Sun, Nov 16, 2003 at 11:49:32AM -0600, Kumar Gala wrote:
> If Ben's comments are correct simply removing -mstring as an option
> passed to the build should get the desired behavior.

Yes.  I can't see any problem with gcc's behaviour here, and I'm
surprised that some processor is taking alignment exceptions on lswi.
book3 ppcas says lswi will generate an alignment exception when "the
operand is in storage that is Write Through Required or Caching
Inhibited, or the processor is in Little-Endian mode".  It can also
happen for operands that cross segment boundaries or page boudaries with
different attributes.

--
Alan Modra
IBM OzLabs - Linux Technology Centre

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 22:19             ` Alan Modra
@ 2003-11-16 22:45               ` Jon Masters
  2003-11-17  0:50               ` Paul Mackerras
  1 sibling, 0 replies; 24+ messages in thread
From: Jon Masters @ 2003-11-16 22:45 UTC (permalink / raw)
  To: Alan Modra; +Cc: linuxppc-dev list



Alan Modra wrote:

| On Sun, Nov 16, 2003 at 11:49:32AM -0600, Kumar Gala wrote:
|
|>If Ben's comments are correct simply removing -mstring as an option
|>passed to the build should get the desired behavior.
|
|
| Yes.  I can't see any problem with gcc's behaviour here, and I'm
| surprised that some processor is taking alignment exceptions on lswi.

The 601 reputedly broke a number of the PowerPC specifications however
so I am not surprised if it has this problem - the trouble is you can
fix up the alignment exception handler in Linux but not the firmware
handlers being used prior to that. Anyway people suggested workarounds.

Jon.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 10:17         ` Benjamin Herrenschmidt
  2003-11-16 17:49           ` Kumar Gala
@ 2003-11-16 23:04           ` David Edelsohn
  2003-11-17  0:40             ` Paul Mackerras
  2003-11-19 21:51             ` linas
  1 sibling, 2 replies; 24+ messages in thread
From: David Edelsohn @ 2003-11-16 23:04 UTC (permalink / raw)
  To: benh; +Cc: Olaf Hering, Alan Modra, linuxppc-dev list


>>>>> Benjamin Herrenschmidt writes:

Ben> I remember beeing regulary told (I think by Apple while I was still
Ben> doing MacOS hacking) that those string instructions were evil,
Ben> deprecated, and should be avoided as they weren't peforming better
Ben> than the equivalent set of load/store instructions... Is this
Ben> still true ? In which case we may want to avoid generating them
Ben> from gcc..

	The information that you received about lwsi are overly
simplistic.  The instructions are neither overly good nor overly bad --
they should not be used for everything, but neither should they be avoided
at all cost.  They are particularly good for producing compact code and
preserving the instruction cache.  Remember, programming, including
assembly language programming, is an art.

David

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 17:49           ` Kumar Gala
  2003-11-16 22:19             ` Alan Modra
@ 2003-11-16 23:12             ` Benjamin Herrenschmidt
  2003-11-16 23:31               ` David Edelsohn
  1 sibling, 1 reply; 24+ messages in thread
From: Benjamin Herrenschmidt @ 2003-11-16 23:12 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Olaf Hering, linuxppc-dev list, Alan Modra, David Edelsohn


On Mon, 2003-11-17 at 04:49, Kumar Gala wrote:
> If Ben's comments are correct simply removing -mstring as an option
> passed to the build should get the desired behavior.

We surely don't want them on G5 at least as they are microcoded

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 23:12             ` Benjamin Herrenschmidt
@ 2003-11-16 23:31               ` David Edelsohn
  2003-11-17  9:19                 ` Gabriel Paubert
  0 siblings, 1 reply; 24+ messages in thread
From: David Edelsohn @ 2003-11-16 23:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Kumar Gala, Olaf Hering, linuxppc-dev list, Alan Modra


>>>>> Benjamin Herrenschmidt writes:

Ben> We surely don't want them on G5 at least as they are microcoded

	Again, one cannot approach this as black or white.  What about
optimizing for size?

david

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 23:04           ` David Edelsohn
@ 2003-11-17  0:40             ` Paul Mackerras
  2003-11-19 21:51             ` linas
  1 sibling, 0 replies; 24+ messages in thread
From: Paul Mackerras @ 2003-11-17  0:40 UTC (permalink / raw)
  To: David Edelsohn; +Cc: benh, Olaf Hering, Alan Modra, linuxppc-dev list


David Edelsohn writes:

> 	The information that you received about lwsi are overly
> simplistic.  The instructions are neither overly good nor overly bad --
> they should not be used for everything, but neither should they be avoided
> at all cost.  They are particularly good for producing compact code and
> preserving the instruction cache.  Remember, programming, including
> assembly language programming, is an art.

In the experiments that I did comparing different memcpy loops, I
didn't find any combination of alignment, size of copy and processor
implementation where using string loads/stores was as fast as than
ordinary loads and stores.  So I am inclined to think that the only
advantage of the string load/store instructions is in saving a little
bit of icache.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 22:19             ` Alan Modra
  2003-11-16 22:45               ` Jon Masters
@ 2003-11-17  0:50               ` Paul Mackerras
  2003-11-17  7:55                 ` Olaf Hering
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Mackerras @ 2003-11-17  0:50 UTC (permalink / raw)
  To: Alan Modra
  Cc: Kumar Gala, benh, Olaf Hering, linuxppc-dev list, David Edelsohn


Alan Modra writes:

> Yes.  I can't see any problem with gcc's behaviour here, and I'm
> surprised that some processor is taking alignment exceptions on lswi.
> book3 ppcas says lswi will generate an alignment exception when "the
> operand is in storage that is Write Through Required or Caching
> Inhibited, or the processor is in Little-Endian mode".  It can also
> happen for operands that cross segment boundaries or page boudaries with
> different attributes.

Well, firstly PPCAS didn't exist when the 601 was designed, and
secondly the architecture allows implementations to consist of a mix
of hardware and software - meaning that hardware can take an exception
on any condition it likes and expect software to fix it up (provided
hardware gives software enough information to fix it up, etc.).

The original PPC architecture talks about one of the reasons for an
alignment interrupt being that "the operand of an elementary string
load or store crosses a protection boundary".  And the 601 manual says
that a string load or store (except lscbx) will only cause an
interrupt if it crosses a page boundary and it is not word aligned.
Olaf, do we know what the source and destination addresses were?

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-17  0:50               ` Paul Mackerras
@ 2003-11-17  7:55                 ` Olaf Hering
  0 siblings, 0 replies; 24+ messages in thread
From: Olaf Hering @ 2003-11-17  7:55 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alan Modra, Kumar Gala, benh, linuxppc-dev list, David Edelsohn


 On Mon, Nov 17, Paul Mackerras wrote:

> Olaf, do we know what the source and destination addresses were?

source was c0fe4ffa, I cant reproduce it righht now to get the
destination address.

--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 23:31               ` David Edelsohn
@ 2003-11-17  9:19                 ` Gabriel Paubert
  0 siblings, 0 replies; 24+ messages in thread
From: Gabriel Paubert @ 2003-11-17  9:19 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Benjamin Herrenschmidt, Kumar Gala, Olaf Hering,
	linuxppc-dev list, Alan Modra


On Sun, Nov 16, 2003 at 06:31:45PM -0500, David Edelsohn wrote:
>
> >>>>> Benjamin Herrenschmidt writes:
>
> Ben> We surely don't want them on G5 at least as they are microcoded
>
> 	Again, one cannot approach this as black or white.  What about
> optimizing for size?

Using lswi/stswi is fine when optimizing for size, but I think
that using these instructions for the case of an 8 byte
move is wrong in most cases because an additional instruction
is often (but not always) needed to compute the address.

Moving 8 bytes with lswi/stswi:

	la rx,src # compute source address
	lswi 5,rx,8
	la ry, dst# compute destination address
	stswi 5,ry,8

Moving 8 bytes with standard instructions:

	lwz rx,src
	lwz ry,src+4
	stw rx,dst
	stw ry,dst+4

IMHO, lswi/stswi should only be used when the move
would be split into 3 or more l[bhw]z/st[bhw] pairs
(i.e., for sizes of 7 or 9 and up).

	Regards,
	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-16 23:04           ` David Edelsohn
  2003-11-17  0:40             ` Paul Mackerras
@ 2003-11-19 21:51             ` linas
  2003-11-19 22:06               ` Hollis Blanchard
  1 sibling, 1 reply; 24+ messages in thread
From: linas @ 2003-11-19 21:51 UTC (permalink / raw)
  To: David Edelsohn; +Cc: benh, Olaf Hering, Alan Modra, linuxppc-dev list


On Sun, Nov 16, 2003 at 06:04:11PM -0500, David Edelsohn wrote:
>
> >>>>> Benjamin Herrenschmidt writes:
>
> Ben> I remember beeing regulary told (I think by Apple while I was still
> Ben> doing MacOS hacking) that those string instructions were evil,
> Ben> deprecated, and should be avoided as they weren't peforming better
> Ben> than the equivalent set of load/store instructions... Is this
> Ben> still true ? In which case we may want to avoid generating them
> Ben> from gcc..
>
> 	The information that you received about lwsi are overly
> simplistic.  The instructions are neither overly good nor overly bad --
> they should not be used for everything, but neither should they be avoided
> at all cost.  They are particularly good for producing compact code and
> preserving the instruction cache.  Remember, programming, including
> assembly language programming, is an art.

Back in ye olde dayes, these insns were way better (by a factor of 3x)
for doing load/stores to i/o space, cause they could pump out a word
every bus cycle as opposed to every 3 cycles (due to pipeline stalls).

I guess things like G5, etc. now have enough load/store units and
etc. hardware that this is no longer an issue?  So that if I wanted
to, I could PIO fast enough to e.g. keep a pci bus saturated?
(We used PIO in ye olde days for dynamic data that would have gone
stale by the time a dma was set up and run.)  Just curious,  these
insn's used to be freinds, not enemies.

--linas


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-19 21:51             ` linas
@ 2003-11-19 22:06               ` Hollis Blanchard
  2003-11-19 22:50                 ` linas
  0 siblings, 1 reply; 24+ messages in thread
From: Hollis Blanchard @ 2003-11-19 22:06 UTC (permalink / raw)
  To: linas; +Cc: linuxppc-dev list


On Wednesday, Nov 19, 2003, at 15:51 US/Central, linas@austin.ibm.com
wrote:
> Just curious,  these insn's used to be freinds, not enemies.

I think they've always been the enemy of the CPU designers. At least
that's how it was explained to me from a Motorolan, and I assume the
same is true for IBM. And they're the ones writing the CPU manuals
telling people not to use the instructions... :)

--
Hollis Blanchard
IBM Linux Technology Center


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel oops due to unaligned access with lswi
  2003-11-19 22:06               ` Hollis Blanchard
@ 2003-11-19 22:50                 ` linas
  0 siblings, 0 replies; 24+ messages in thread
From: linas @ 2003-11-19 22:50 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: linuxppc-dev list


On Wed, Nov 19, 2003 at 04:06:15PM -0600, Hollis Blanchard wrote:
> On Wednesday, Nov 19, 2003, at 15:51 US/Central, linas@austin.ibm.com
> wrote:
> > Just curious,  these insn's used to be freinds, not enemies.
>
> I think they've always been the enemy of the CPU designers. At least
> that's how it was explained to me from a Motorolan, and I assume the
> same is true for IBM. And they're the ones writing the CPU manuals
> telling people not to use the instructions... :)

Depends on whose CPU designer camp you visit.

DEC Alpha strategy: pump up the clock; minimize number of gate
    delays between start of insn cycle and end of cycle.  Less
    gate delay == faster clock.

Ye olde superscalar strategy: do more per clock cycle by deploying
    more transistors (even if one must have slower clock as a result.)

The load string insn needs a big fat shift register with oodles of
gate delays right in the middle of the load/store path.  No other
insn's need or use this register.   Getting rid of it allows you
to pump up the clock.

The original POWER cpu designers clearly thought it was a worthwhile
tradeoff, otherwise it wouldn't have been in the insn set to begin
with.  But the alpha camp sure made a clear and ringing point ...

--linas

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-11-19 22:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-15 21:04 kernel oops due to unaligned access with lswi Olaf Hering
2003-11-15 22:24 ` Olaf Hering
2003-11-15 22:30   ` David Edelsohn
2003-11-15 22:37     ` Olaf Hering
2003-11-15 22:43     ` Olaf Hering
2003-11-15 22:59       ` David Edelsohn
2003-11-16 10:17         ` Benjamin Herrenschmidt
2003-11-16 17:49           ` Kumar Gala
2003-11-16 22:19             ` Alan Modra
2003-11-16 22:45               ` Jon Masters
2003-11-17  0:50               ` Paul Mackerras
2003-11-17  7:55                 ` Olaf Hering
2003-11-16 23:12             ` Benjamin Herrenschmidt
2003-11-16 23:31               ` David Edelsohn
2003-11-17  9:19                 ` Gabriel Paubert
2003-11-16 23:04           ` David Edelsohn
2003-11-17  0:40             ` Paul Mackerras
2003-11-19 21:51             ` linas
2003-11-19 22:06               ` Hollis Blanchard
2003-11-19 22:50                 ` linas
2003-11-16  0:40 ` Paul Mackerras
2003-11-16  1:45   ` Olaf Hering
2003-11-16 16:49     ` Olaf Hering
2003-11-16  5:07   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).