* [parisc-linux] Progress
@ 1999-11-18 23:16 John David Anglin
1999-11-19 3:30 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-18 23:16 UTC (permalink / raw)
To: parisc-linux
With today's cvs patches, things are going better. The kernel booted
to the following point:
...
VFS: Mounted root (ext2 filesystem)
Warning: unable to open an initial console.
Attempting to execute '/sbin/init'
It seems to be a valid SOM executable.
I am using the PDC_CONSOLE. I think there was a changed needed to allow
opening the PDC_CONSOLE that was discussed previously. Does anybody
remember? Maybe with this change I can get to sash.
I also need to slow down the console messages.
The space registers are all zero.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-18 23:16 [parisc-linux] Progress John David Anglin
@ 1999-11-19 3:30 ` Philipp Rumpf
1999-11-21 23:07 ` John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-19 3:30 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
> VFS: Mounted root (ext2 filesystem)
> Warning: unable to open an initial console.
> Attempting to execute '/sbin/init'
> It seems to be a valid SOM executable.
>
> I am using the PDC_CONSOLE. I think there was a changed needed to allow
> opening the PDC_CONSOLE that was discussed previously. Does anybody
> remember? Maybe with this change I can get to sash.
edit arch/parisc/boot/boot_code/ipl_c.c and replace the "ttyS0" in the command
line with "tty". Alternatively, boot in interactive mode and edit the command
line (again replacing "ttyS0" with "tty").
That should get you to an (most likely unusable) shell prompt.
> I also need to slow down the console messages.
edit arch/parisc/kernel/pdc_cons.c and put a __delay(N); in pdc_putc. N is
approximately the number of CPU cycles you want to wait. I suggest placing
it directly after the "case 'n':" as that gives you a per-line delay instead
of a per-character one.
> The space registers are all zero.
They always are, currently (this needs to, and will, change).
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-19 3:30 ` Philipp Rumpf
@ 1999-11-21 23:07 ` John David Anglin
1999-11-22 1:12 ` John David Anglin
1999-11-22 7:30 ` Philipp Rumpf
0 siblings, 2 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-21 23:07 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: parisc-linux
Hi Philipp,
> > VFS: Mounted root (ext2 filesystem)
> > Warning: unable to open an initial console.
> > Attempting to execute '/sbin/init'
> > It seems to be a valid SOM executable.
> >
> > I am using the PDC_CONSOLE. I think there was a changed needed to allow
> > opening the PDC_CONSOLE that was discussed previously. Does anybody
> > remember? Maybe with this change I can get to sash.
>
> edit arch/parisc/boot/boot_code/ipl_c.c and replace the "ttyS0" in the command
> line with "tty". Alternatively, boot in interactive mode and edit the command
> line (again replacing "ttyS0" with "tty").
>
> That should get you to an (most likely unusable) shell prompt.
I am currently booting with "hpux /stand/vmunix".
This doesn't use the boot code in ipl_c.c. Maybe the problem is the failure
in main to open "/dev/console". Possibly, "/dev/console" is not in the ram
disk or, if it is, the kernel is not figuring out which tty to use.
> > I also need to slow down the console messages.
>
> edit arch/parisc/kernel/pdc_cons.c and put a __delay(N); in pdc_putc. N is
> approximately the number of CPU cycles you want to wait. I suggest placing
> it directly after the "case 'n':" as that gives you a per-line delay instead
> of a per-character one.
I tried this and it didn't work. It looks like __delay() is broken. The
".balignl" is clearly bogus. I think the "addib" tests need to bew looked
at more closely.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-21 23:07 ` John David Anglin
@ 1999-11-22 1:12 ` John David Anglin
1999-11-22 8:24 ` Philipp Rumpf
1999-11-22 7:30 ` Philipp Rumpf
1 sibling, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-22 1:12 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp.H.Rumpf, parisc-linux
> > > I also need to slow down the console messages.
> >
> > edit arch/parisc/kernel/pdc_cons.c and put a __delay(N); in pdc_putc. N is
> > approximately the number of CPU cycles you want to wait. I suggest placing
> > it directly after the "case 'n':" as that gives you a per-line delay instead
> > of a per-character one.
>
> I tried this and it didn't work. It looks like __delay() is broken. The
> ".balignl" is clearly bogus. I think the "addib" tests need to bew looked
> at more closely.
__delay() in delay.h is ok except for ".balignl". The .balignl inserts
a bunch of "ldi 1a,%r0" instructions which do nothing. I just didn't specify
enough cycles before.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
--- delay.h.orig Mon Oct 11 14:52:33 1999
+++ delay.h Sun Nov 21 19:29:28 1999
@@ -11,8 +11,7 @@
extern __inline__ void __delay(unsigned long loops) {
asm volatile(
- " .balignl 64,0x34000034
- addib,UV,n -1,%0,.
+ " addib,UV,n -1,%0,.
addib,NUV,n -1,%0,.+8
nop"
: "=r" (loops) : "0" (loops));
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-21 23:07 ` John David Anglin
1999-11-22 1:12 ` John David Anglin
@ 1999-11-22 7:30 ` Philipp Rumpf
1999-11-22 18:11 ` John David Anglin
1999-11-23 15:59 ` [parisc-linux] Progress Paul Bame
1 sibling, 2 replies; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-22 7:30 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, parisc-linux
> Hi Philipp,
Hi.
> I am currently booting with "hpux /stand/vmunix".
>
> This doesn't use the boot code in ipl_c.c. Maybe the problem is the failure
> in main to open "/dev/console". Possibly, "/dev/console" is not in the ram
> disk or, if it is, the kernel is not figuring out which tty to use.
Don't do that, then :).
Honestly, I don't think we have a way yet to use the command line the HPUX
boot loader passes us (which I was told uses the ANSI C way of passing
arguments, i.e. argument count in GR26, pointer to NULL-terminated array
of pointers to NULL-terminated arrays of strings in GR25). Anyone up to
write some glue code that puts that back together into a simple long string
Linux's commandline splitting code can spli again ?
> > > I also need to slow down the console messages.
> >
> > edit arch/parisc/kernel/pdc_cons.c and put a __delay(N); in pdc_putc. N is
> > approximately the number of CPU cycles you want to wait. I suggest placing
> > it directly after the "case 'n':" as that gives you a per-line delay instead
> > of a per-character one.
>
> I tried this and it didn't work. It looks like __delay() is broken. The
> ".balignl" is clearly bogus.
Oh. Why ? The way it is intended is to align the following code to a 64-byte
boundary (cache lines on current PA2.0 CPUs are 64 bytes, I think) using nops.
> I think the "addib" tests need to bew looked at more closely.
Possible. They seem to work fine for the rest of us though (i.e. the delay
loop gets calibrated nicely).
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 1:12 ` John David Anglin
@ 1999-11-22 8:24 ` Philipp Rumpf
1999-11-22 9:59 ` Alan Cox
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-22 8:24 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, parisc-linux
> __delay() in delay.h is ok except for ".balignl". The .balignl inserts
> a bunch of "ldi 1a,%r0" instructions which do nothing. I just didn't specify
> enough cycles before.
Yup. They are intended to do nothing to get the following code nicely aligned.
Actually I wonder now whether the best way to implement __delay(x) is:
mfctl 16, %0 ; current interval timer value
addl %0, %1, %1 ; interval timer value we want to reach
subl %1, %0, %0 ; want-is
comb,> %0, 0, .-4 ; while((want-is)>0)
mfctl 16, %0 ; current interval timer value
I actually like this quite a lot;
- should be shorter than the old loop (5 instructions instead of 3
instructions plus alignment)
- should work well for low values (mfctl is quite fast and the rest
is just arithmetic operations - and we don't have any nops in there)
- more exact than other __delays (interrupts, cache effects,
alignment, and, at least in theory, power-saving modes can make
other __delays inexact)
- more exact wrt our timer source (as CR16 actually _is_ our timer
source). This might be a bad thing as it means we don't have a sanity
check for our timer anymore.
> extern __inline__ void __delay(unsigned long loops) {
> asm volatile(
> - " .balignl 64,0x34000034
> - addib,UV,n -1,%0,.
> + " addib,UV,n -1,%0,.
> addib,NUV,n -1,%0,.+8
> nop"
> : "=r" (loops) : "0" (loops));
Just to scare you a bit, have a look at the PCXL ERS, Section 6.4 "Instruction
Lookaside Buffer". This is basically a one-entry TLB that gets set from the
real TLB and takes some time to do so.
Now picture the page boundary happes between the two addibs. This loop will
execute at about a third of the speed of a normal delay loop. The code is
inlined, so only one loop gives you bogus results - if it is the BogoMIPS
calibration loop, udelay(N) will actually only delay for N/3 us, which can
have unexpected effects on hardware we use udelay() for.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 8:24 ` Philipp Rumpf
@ 1999-11-22 9:59 ` Alan Cox
1999-11-22 15:54 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 1999-11-22 9:59 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: dave, Philipp.H.Rumpf, parisc-linux
> Now picture the page boundary happes between the two addibs. This loop will
> execute at about a third of the speed of a normal delay loop. The code is
> inlined, so only one loop gives you bogus results - if it is the BogoMIPS
> calibration loop, udelay(N) will actually only delay for N/3 us, which can
> have unexpected effects on hardware we use udelay() for.
On x86 things like this, and execution timing differences caused by cache
line alignment and other phase-of-moon issues eventually lead us to not inline
the function. Even then we had to land it on a 32 byte boundary
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 9:59 ` Alan Cox
@ 1999-11-22 15:54 ` Philipp Rumpf
0 siblings, 0 replies; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-22 15:54 UTC (permalink / raw)
To: Alan Cox; +Cc: Philipp Heinrich Rumpf, dave, parisc-linux
> On x86 things like this, and execution timing differences caused by cache
> line alignment and other phase-of-moon issues eventually lead us to not inline
> the function. Even then we had to land it on a 32 byte boundary
That's why I'm thinking about switching to a loop not based upon the time
it takes to execute instructions at all (see CR16 loop). x86 doesn't have
the option (well, on Pentiums and above you have).
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 7:30 ` Philipp Rumpf
@ 1999-11-22 18:11 ` John David Anglin
1999-11-22 18:27 ` Stan Sieler
1999-11-22 18:33 ` Philipp Rumpf
1999-11-23 15:59 ` [parisc-linux] Progress Paul Bame
1 sibling, 2 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-22 18:11 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: Philipp.H.Rumpf, parisc-linux
> > I tried this and it didn't work. It looks like __delay() is broken. The
> > ".balignl" is clearly bogus.
>
> Oh. Why ? The way it is intended is to align the following code to a 64-byte
> boundary (cache lines on current PA2.0 CPUs are 64 bytes, I think) using nops.
Missed this point because of the strange nop. The current addib loop is 3
instructions. Alignment to a multiple of 16 should be good enough to
ensure that the loop lies within a cache line. This would insert a maximum
of 3 nops before the loop. This would provide a slightly more deterministic
result.
Also, re the BogoMIPS number, I think this should be (loops_per_sec*3)/2000000
(i.e., there is one addib and 0.5 nop instructions per loop when the
number of iterations is large. The number that is currently printed is
loops_per_sec*2/1000000.
The simple loop "addib,NUV,n .;nop" is slower but more deterministic. It
only needs an alignment of 8 (at most one nop). The number of instructions
per loop is 2*N-1.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 18:11 ` John David Anglin
@ 1999-11-22 18:27 ` Stan Sieler
1999-11-22 18:42 ` Philipp Rumpf
1999-11-22 18:33 ` Philipp Rumpf
1 sibling, 1 reply; 50+ messages in thread
From: Stan Sieler @ 1999-11-22 18:27 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
Re:
> Missed this point because of the strange nop. The current addib loop is 3
> instructions. Alignment to a multiple of 16 should be good enough to
> ensure that the loop lies within a cache line. This would insert a maximum
> of 3 nops before the loop. This would provide a slightly more deterministic
> result.
>
> Also, re the BogoMIPS number, I think this should be (loops_per_sec*3)/2000000
...
This should be timed on a PA-RISC 2.0 machine, just in case. Some of
them are really good at apparently doing things twice as fast as you'd
expect :)
Our K460 gets twice the MIPs you'd expect ... i.e., it executes a simple
timing loop at about 360 million instructions per second ...
on a single CPU machine with a 180 MHz clock!
--
Stan Sieler sieler@allegro.com
www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 18:11 ` John David Anglin
1999-11-22 18:27 ` Stan Sieler
@ 1999-11-22 18:33 ` Philipp Rumpf
1999-11-22 20:55 ` John David Anglin
1 sibling, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-22 18:33 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, parisc-linux
> Missed this point because of the strange nop. The current addib loop is 3
> instructions.
It is ?
The way it looks to me is it is two instructions in the main loop.
> Alignment to a multiple of 16 should be good enough to
> ensure that the loop lies within a cache line.
I agree. Still, I like the CR16 based loop so much better.
> Also, re the BogoMIPS number, I think this should be (loops_per_sec*3)/2000000
> (i.e., there is one addib and 0.5 nop instructions per loop when the
> number of iterations is large.
Again, I disagree. It is two addibs per loop.
> The number that is currently printed is
> loops_per_sec*2/1000000.
We cannot go around and change it either. It is in architecture-independent
code. (It's right, too. If you have "branch if > 0" and "subtract one"
instructions, the number of MIPS is loops_per_sec*2 (2 instructions per
iteration) / 1000000 (the M part).
> The simple loop "addib,NUV,n .;nop" is slower but more deterministic. It
NUV means no unsigned overflow
-1 + N overflows for N != 0
So, what you really want is "addib,UV -1, %0, .; nop" (or, I think,
"addib,UV,n -1, %0, .") ?
> only needs an alignment of 8 (at most one nop). The number of instructions
> per loop is 2*N-1.
Pardon me, why the "-1" part ? For N==1, you execute
addib,UV -1, %0, . ; -1 + 1 overflows
nop
addib,UV -1, %0, . ; -1 + 0 does not overflow
nop
, so this would imply 2*(N+1) as number of executed instructions.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 18:27 ` Stan Sieler
@ 1999-11-22 18:42 ` Philipp Rumpf
0 siblings, 0 replies; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-22 18:42 UTC (permalink / raw)
To: Stan Sieler; +Cc: John David Anglin, parisc-linux
> Our K460 gets twice the MIPs you'd expect ... i.e., it executes a simple
> timing loop at about 360 million instructions per second ...
> on a single CPU machine with a 180 MHz clock!
Keep in mind the "subtract one and branch if register non-zero" is really
only a single instruction - and the following one you can always send to
another pipeline because of the two-entry IAOQ thing.
You can see the same effect on many x86 CPUs, some PPCs and probably other
animals as well.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 18:33 ` Philipp Rumpf
@ 1999-11-22 20:55 ` John David Anglin
1999-11-23 11:47 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-22 20:55 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: Philipp.H.Rumpf, parisc-linux
>
> > Missed this point because of the strange nop. The current addib loop is 3
> > instructions.
>
> It is ?
>
> The way it looks to me is it is two instructions in the main loop.
My timing tests on a 735 indicate that the loop with two addib instructions
is only 25% faster than a loop with one addib and one nop. You are correct
that there are only two actual instructions in the loop. However, the timing
measurements indicate that the two addib loop stalls for one instruction
time. Thus, effectively there is an extra nop in your loop. As a result,
my comments re the BogoMIPS calculation below are correct for a 735 PA7100.
I think the stall is on the second addib when the forward branch is not taken.
>
> > Alignment to a multiple of 16 should be good enough to
> > ensure that the loop lies within a cache line.
>
> I agree. Still, I like the CR16 based loop so much better.
>
> > Also, re the BogoMIPS number, I think this should be (loops_per_sec*3)/2000000
> > (i.e., there is one addib and 0.5 nop instructions per loop when the
> > number of iterations is large.
>
> Again, I disagree. It is two addibs per loop.
>
> > The number that is currently printed is
> > loops_per_sec*2/1000000.
>
> We cannot go around and change it either. It is in architecture-independent
> code. (It's right, too. If you have "branch if > 0" and "subtract one"
> instructions, the number of MIPS is loops_per_sec*2 (2 instructions per
> iteration) / 1000000 (the M part).
[deleted]
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 20:55 ` John David Anglin
@ 1999-11-23 11:47 ` Philipp Rumpf
1999-11-23 16:19 ` John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-23 11:47 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, parisc-linux
> My timing tests on a 735 indicate that the loop with two addib instructions
> is only 25% faster than a loop with one addib and one nop.
Who cares about how fast the loop is, actually ?
> You are correct that there are only two actual instructions in the loop.
> However, the timing measurements indicate that the two addib loop stalls
> for one instruction time. Thus, effectively there is an extra nop in your
> loop.
> As a result, my comments re the BogoMIPS calculation below are correct for
> a 735 PA7100. I think the stall is on the second addib when the forward
> branch is not taken.
Forward branches should be predicted not taken.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-22 7:30 ` Philipp Rumpf
1999-11-22 18:11 ` John David Anglin
@ 1999-11-23 15:59 ` Paul Bame
1999-11-23 16:33 ` John David Anglin
1999-11-23 20:15 ` John David Anglin
1 sibling, 2 replies; 50+ messages in thread
From: Paul Bame @ 1999-11-23 15:59 UTC (permalink / raw)
To: parisc-linux
=
= Honestly, I don't think we have a way yet to use the command line the HPUX
= boot loader passes us (which I was told uses the ANSI C way of passing
= arguments, i.e. argument count in GR26, pointer to NULL-terminated array
= of pointers to NULL-terminated arrays of strings in GR25). Anyone up to
= write some glue code that puts that back together into a simple long string
= Linux's commandline splitting code can spli again ?
=
I wrote that glue a while back -- it's in real/setup.c and is #if-0-ed.
Unfortunately the strings are stored
in low-ish physical RAM which is overwritten when the our kernel is
loaded. If we want the hpux command line badly enough, we'll have
to load the kernel at a higher physical location. It could then
be copied to 0x10000 after we collect the command line.
One could hard-code a command line in real/setup.c temporarily...
-P
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 11:47 ` Philipp Rumpf
@ 1999-11-23 16:19 ` John David Anglin
1999-11-23 18:03 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-23 16:19 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: Philipp.H.Rumpf, parisc-linux
>
> > My timing tests on a 735 indicate that the loop with two addib instructions
> > is only 25% faster than a loop with one addib and one nop.
>
> Who cares about how fast the loop is, actually ?
You get better control of the number of delay instructions executed. This
is the definition of __delay() that I like:
/*
* __delay(N) executes N+2 or N+3 instructions without any pipeline stalls
* depending on whether it is aligned on an eight byte boundary or not.
*/
extern __inline__ void __delay(unsigned long loops) {
asm volatile(
" .balignl 8,0x34000034
addib,UV -1,%0,.
addi,NUV -1,%0,%0"
: "=r" (loops) : "0" (loops));
}
Increase your BogoMIPS!
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 15:59 ` [parisc-linux] Progress Paul Bame
@ 1999-11-23 16:33 ` John David Anglin
1999-11-23 17:05 ` Paul Bame
1999-11-23 20:15 ` John David Anglin
1 sibling, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-23 16:33 UTC (permalink / raw)
To: Paul Bame; +Cc: parisc-linux
> = Honestly, I don't think we have a way yet to use the command line the HPUX
> = boot loader passes us (which I was told uses the ANSI C way of passing
> = arguments, i.e. argument count in GR26, pointer to NULL-terminated array
> = of pointers to NULL-terminated arrays of strings in GR25). Anyone up to
> = write some glue code that puts that back together into a simple long string
> = Linux's commandline splitting code can spli again ?
> =
>
> I wrote that glue a while back -- it's in real/setup.c and is #if-0-ed.
> Unfortunately the strings are stored
> in low-ish physical RAM which is overwritten when the our kernel is
> loaded. If we want the hpux command line badly enough, we'll have
> to load the kernel at a higher physical location. It could then
> be copied to 0x10000 after we collect the command line.
>
> One could hard-code a command line in real/setup.c temporarily...
How much higher do you think is necessary? Hpux seems to use 0x11000.
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 16:33 ` John David Anglin
@ 1999-11-23 17:05 ` Paul Bame
0 siblings, 0 replies; 50+ messages in thread
From: Paul Bame @ 1999-11-23 17:05 UTC (permalink / raw)
To: John David Anglin; +Cc: parisc-linux
= > Unfortunately the strings are stored
= > in low-ish physical RAM which is overwritten when the our kernel is
= > loaded. If we want the hpux command line badly enough, we'll have
= > to load the kernel at a higher physical location. It could then
= > be copied to 0x10000 after we collect the command line.
=
= How much higher do you think is necessary? Hpux seems to use 0x11000.
I seem to recall 0x21xxx was where the strings were stored but my
memory is fallable, and printfs are easy to insert into the #if-0-ed
section (#if-1 it too...). This is probably the "free" area at the
end of the hpux boot loader. I think the boot loader text+data size
cannot exceed 256k (0x40000), plus maybe 64k (0x10000) for BSS and
heap, plus the 64k unusable at 0, so maybe 0x60000
would be a good conservative spot?
-P
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 16:19 ` John David Anglin
@ 1999-11-23 18:03 ` Philipp Rumpf
1999-11-23 19:01 ` John David Anglin
1999-11-23 21:11 ` Stan Sieler
0 siblings, 2 replies; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-23 18:03 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, parisc-linux
> /*
> * __delay(N) executes N+2 or N+3 instructions without any pipeline stalls
> * depending on whether it is aligned on an eight byte boundary or not.
> */
>
> extern __inline__ void __delay(unsigned long loops) {
> asm volatile(
> " .balignl 8,0x34000034
> addib,UV -1,%0,.
> addi,NUV -1,%0,%0"
> : "=r" (loops) : "0" (loops));
> }
I'd agree this loop is the nicest of the "real" delay loops so far, but I'm
still unconvinced there's any advantage to using a "real" delay loop over a
CR16-based one.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 18:03 ` Philipp Rumpf
@ 1999-11-23 19:01 ` John David Anglin
1999-11-23 21:11 ` Stan Sieler
1 sibling, 0 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-23 19:01 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: Philipp.H.Rumpf, parisc-linux
> I'd agree this loop is the nicest of the "real" delay loops so far, but I'm
> still unconvinced there's any advantage to using a "real" delay loop over a
> CR16-based one.
I think it depends on what the intended use of __delay() is. If you just
want to wait a few cycles for a device register to update, then the "real"
delay loop should be fine. The CR16-based loop will return faster if
there is some kind of hardware event during the loop. But does it matter?
Usually, you don't care if the delay is longer than specified.
The CR16 loop is probably better for long delays. However, in this case,
we probably should be sleeping instead.
For small delays, the CR16 loop has the same problems as the "real"
loop (cache and page faults, interrupts, etc). The CR16 timer also
has a model dependent rate. My documentation indicates the rate
varies from 0.5 to 2 times the peak instruction rate. The algorithm
is not as tight as the "real" loop.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 15:59 ` [parisc-linux] Progress Paul Bame
1999-11-23 16:33 ` John David Anglin
@ 1999-11-23 20:15 ` John David Anglin
1 sibling, 0 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-23 20:15 UTC (permalink / raw)
To: Paul Bame; +Cc: parisc-linux
> One could hard-code a command line in real/setup.c temporarily...
I think this is in fact reasonable. The default command line
define needs to move from ipl_c.c to bootdata.h. This header can then
be loaded by real/setup.c and the default command line set for the
hpux load. Even, if we get the passing of args from hpux working,
it would still be good to have a default command line so I don't have
to type it all the time.
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 18:03 ` Philipp Rumpf
1999-11-23 19:01 ` John David Anglin
@ 1999-11-23 21:11 ` Stan Sieler
1999-11-24 10:00 ` Philipp Rumpf
1 sibling, 1 reply; 50+ messages in thread
From: Stan Sieler @ 1999-11-23 21:11 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: parisc-linux
Re:
> I'd agree this loop is the nicest of the "real" delay loops so far, but I'm
> still unconvinced there's any advantage to using a "real" delay loop over a
> CR16-based one.
If interrupts can happen (and get serviced), a CR16 loop is inherently
unreliable. If you're looping until CR16 becomes >= some value, X, then
you might find that CR16 goes: n, n+2, n+3, n-10000, n-9998, ...
I.e., the interrupt servicing could take enough time that the value in
CR16 becomes misleading. If the new CR16 is < the last seen one, is it
simply because it "rolled over", or because an interrupt came in and
you've spent an unknown amount of time doing other things?
OTOH, I like the idea of using a drastic change to CR16 as a signal
that "something happened", and consider using that as a clue
to prematurely exit a counter-based loop. In that scenario, I'd
expect a routine like: delay (int loops, int premature_exit_ok),
which would let the caller decide if a premature (CR16 change based)
exit was allowable.
--
Stan Sieler sieler@allegro.com
www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-23 21:11 ` Stan Sieler
@ 1999-11-24 10:00 ` Philipp Rumpf
1999-11-24 19:00 ` Stan Sieler
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-24 10:00 UTC (permalink / raw)
To: Stan Sieler; +Cc: Philipp Heinrich Rumpf, parisc-linux
> If interrupts can happen (and get serviced), a CR16 loop is inherently
> unreliable. If you're looping until CR16 becomes >= some value, X, then
> you might find that CR16 goes: n, n+2, n+3, n-10000, n-9998, ...
>
> I.e., the interrupt servicing could take enough time that the value in
> CR16 becomes misleading. If the new CR16 is < the last seen one, is it
> simply because it "rolled over", or because an interrupt came in and
> you've spent an unknown amount of time doing other things?
Look at the loop. What we do is basically
cr16 = mfctl(16);
while(((cr16+loops)-mfctl(16))>0);
Which works well, unless CR16 suddenly changes by 2^31 or more. This
would correspond to 10-20 seconds spent in an interrupt handler which
is unlikely (and will have negative effects on our timer interrupt as
well).
> OTOH, I like the idea of using a drastic change to CR16 as a signal
> that "something happened", and consider using that as a clue
> to prematurely exit a counter-based loop. In that scenario, I'd
> expect a routine like: delay (int loops, int premature_exit_ok),
> which would let the caller decide if a premature (CR16 change based)
> exit was allowable.
And what exactly would be the advantage ?
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-24 10:00 ` Philipp Rumpf
@ 1999-11-24 19:00 ` Stan Sieler
1999-11-24 19:33 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: Stan Sieler @ 1999-11-24 19:00 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: parisc-linux
Re:
> Look at the loop. What we do is basically
>
> cr16 = mfctl(16);
> while(((cr16+loops)-mfctl(16))>0);
You definitely don't want to do the above!
Even ignoring the possibility of an interrupt that takes us away
for awhile, there's the simple possibility that cr16 might
roll over during your loop.
(On a 100 MHz machine, with it ticking once per clock,
it rolls over about once every 40 seconds or so.)
This means you have an non-0 (although low) probability that your
loop may screw you up royally!
For example, if CR16 was 10 ticks away from rolling over, and
you wanted to delay for 9 ticks there's a non-0 probability
that it will rollover in between checks... poof, a 40 second delay
occurs in your loop! (Perhaps more, if you don't happen to grab
the cr16 within an acceptable window of time at the end of the 40 seconds.)
Detailing the above:
cr16 = mfctl (16); (and get's max-10)
cr16 + 9 = max - 1
after 4 to 20 loops, depending upon cr16 implementation on the machine,
you *could* get to:
while ( ((max - 1) - (max - 2)) > 0)
and then loop back and get
while ( ((max - 1) - (0) ) > 0)
The likelihood of this (an observed rollover fouling you up) would be
greatly minimized if you added code to note the delta in each two
successive reads of cr16 and terminating if the delta became negative
or quite large.
> And what exactly would be the advantage ?
Not having sporadic hangs of 40 seconds? (or more)
--
Stan Sieler sieler@allegro.com
www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress
1999-11-24 19:00 ` Stan Sieler
@ 1999-11-24 19:33 ` Philipp Rumpf
1999-11-24 20:55 ` [parisc-linux] Progress - Update John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-24 19:33 UTC (permalink / raw)
To: Stan Sieler; +Cc: Philipp Heinrich Rumpf, parisc-linux
> > Look at the loop. What we do is basically
> >
> > cr16 = mfctl(16);
> > while(((cr16+loops)-mfctl(16))>0);
>
> You definitely don't want to do the above!
>
> Even ignoring the possibility of an interrupt that takes us away
> for awhile, there's the simple possibility that cr16 might
> roll over during your loop.
(cr16+loops) < mfctl(16) does not handle roll-over correctly
((cr16+loops)-mfctl(16)) < 0 does.
> (On a 100 MHz machine, with it ticking once per clock,
> it rolls over about once every 40 seconds or so.)
> This means you have an non-0 (although low) probability that your
> loop may screw you up royally!
No, I haven't. Think about it.
> For example, if CR16 was 10 ticks away from rolling over, and
> you wanted to delay for 9 ticks there's a non-0 probability
> that it will rollover in between checks... poof, a 40 second delay
> occurs in your loop! (Perhaps more, if you don't happen to grab
> the cr16 within an acceptable window of time at the end of the 40 seconds.)
>
> Detailing the above:
>
> cr16 = mfctl (16); (and get's max-10)
> cr16 + 9 = max - 1
cr16+loops mfctl(16) ((c+l)-m(16)) ((c+l)-m(16))>0
0xffffffff 0xfffffff5 0x0000000a true
0xffffffff 0xfffffff6 0x00000009 true
0xffffffff 0xfffffff7 0x00000008 true
...
0xffffffff 0xffffffff 0x00000000 false
0xffffffff 0x00000000 0xffffffff false
0xffffffff 0x00000001 0xfffffffe false
...
0xffffffff 0x00010000 0xfffeffff false
Handles the roll-over just nicely, doesn't it ?
> after 4 to 20 loops, depending upon cr16 implementation on the machine,
> you *could* get to:
> while ( ((max - 1) - (max - 2)) > 0)
> and then loop back and get
> while ( ((max - 1) - (0) ) > 0)
oh, I think I see your problem. Of course, we rely on the (a-b) part of
(a-b)>0 to be signed. (using ">0" for unsigned integers doesn't make much
sense).
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 19:33 ` Philipp Rumpf
@ 1999-11-24 20:55 ` John David Anglin
1999-11-24 21:05 ` Philipp Rumpf
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-24 20:55 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: sieler, Philipp.H.Rumpf, parisc-linux
It's hard to believe that my initial post started all this discussion
about delay loops. My original post was about progress in trying to
get to the sash prompt on my 735. Well were still not there!
I have modified real/setup.c, boot/boot_code/ipl_c.c and include/asm/bootdata.h
so that a default command line is now set when I boot with the hpux loader.
The default command line now prints during the boot. However, the system
is still not able to open an initial console.
I removed all LASI stuff from my configuration since some of the boot
messages made it seem that it was trying to do things that were bad. It
did find the ASP, however, when the LASI stuff was in.
I am now mainly concerned that some of the memory configuration messages
seem strange. I also got a panic after:
Attempting to execute '/sbin/init'
It seems to be a valid SOM executable.
I got a nice dump of the registers. I think this shows that interruptions
for bus errors are working. The panic occured at sem_exit+2C when it
tried to access location fffc5e6c. I will try to determine how this address
got used by sem_exit.
This is what I copied down from the boot re memory and the kernel:
548872 + 532480 + 131952
...
Clearing BSS 0x00118550 --> 0x00139370
Free mem starts at 0xc0139370
Available Virtual mapped memory 0xc0139370 - c5000000
Memory : 14768K available (536K kernel code, 1016K data, 64K init) [c0000000, c1000000]
...
initrd : c009e000 - c00f5800
free_area_init : c0143000 c2000000
mem_map = c0143000
...
The initial BSS clear and the start of free memory are correct. However,
the 14768K available and the upper limit for the free_area_init appear
incorrect.
Any thoughts?
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 20:55 ` [parisc-linux] Progress - Update John David Anglin
@ 1999-11-24 21:05 ` Philipp Rumpf
1999-11-24 21:27 ` Kirk Bresniker
1999-11-25 0:31 ` [parisc-linux] Progress - Update John David Anglin
1999-11-26 13:40 ` Matthew Wilcox
2 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-24 21:05 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, sieler, parisc-linux
>I have modified real/setup.c, boot/boot_code/ipl_c.c and include/asm/bootdata.h
>so that a default command line is now set when I boot with the hpux loader.
>The default command line now prints during the boot. However, the system
>is still not able to open an initial console.
ramdisk known to be good (which one do you use, md5sum if possible) ?
.config known to be good (attach it) ?
are you up-to-date with the cvs tree (I have no idea which caches the 735 has,
but cache flushes won't harm (well, they do harm performance)) ?
> Available Virtual mapped memory 0xc0139370 - c5000000
> Memory : 14768K available (536K kernel code, 1016K data, 64K init) [c0000000, c1000000]
> ...
> initrd : c009e000 - c00f5800
> free_area_init : c0143000 c2000000
> mem_map = c0143000
> ...
>
> The initial BSS clear and the start of free memory are correct. However,
> the 14768K available and the upper limit for the free_area_init appear
> incorrect.
Only using 16 MB RAM is hard-coded for now. (Not because we can't detect
more, but because we don't want to use more as that might require additional
BTLB entries.).
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 21:05 ` Philipp Rumpf
@ 1999-11-24 21:27 ` Kirk Bresniker
1999-11-24 21:37 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: Kirk Bresniker @ 1999-11-24 21:27 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: dave, sieler, parisc-linux
Philipp,
| are you up-to-date with the cvs tree (I have no idea which caches the 735 has,
| but cache flushes won't harm (well, they do harm performance)) ?
The cache flush instructions are architected to cause a processor to memory transfer
iff the cache line is dirty, while an instruction is executed, unless a dirty
cache line is referenced, memory bandwidth is not consumed.
KMB
--
+============================================================+
| Kirk Bresniker (916) 748-2393 |
| 8000 Foothills Blvd |
| Roseville, CA 95747-5649 |
| kirkb@rose.hp.com |
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 21:27 ` Kirk Bresniker
@ 1999-11-24 21:37 ` Philipp Rumpf
1999-11-24 22:38 ` Frank Rowand
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-24 21:37 UTC (permalink / raw)
To: Kirk Bresniker; +Cc: Philipp Heinrich Rumpf, dave, sieler, parisc-linux
>| are you up-to-date with the cvs tree (I have no idea which caches the 735
>| has, but cache flushes won't harm (well, they do harm performance)) ?
>
> The cache flush instructions are architected to cause a processor to memory
> transfer iff the cache line is dirty, while an instruction is executed,
> unless a dirty cache line is referenced, memory bandwidth is not consumed.
At the current rate of cache flushes (way way too high), the main factor wrt
performance is indeed the execution of the instructions.
As soon as we've implemented page colouring (which turns out to be really very
close to large page support), I expect us to get along with very very few
cache flushes - basically I hope we can avoid them completely.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 21:37 ` Philipp Rumpf
@ 1999-11-24 22:38 ` Frank Rowand
1999-11-24 23:24 ` Cache Flushes Grant Grundler
0 siblings, 1 reply; 50+ messages in thread
From: Frank Rowand @ 1999-11-24 22:38 UTC (permalink / raw)
To: parisc-linux
Philipp Rumpf wrote:
>
> >| are you up-to-date with the cvs tree (I have no idea which caches the 735
> >| has, but cache flushes won't harm (well, they do harm performance)) ?
> >
> > The cache flush instructions are architected to cause a processor to memory
> > transfer iff the cache line is dirty, while an instruction is executed,
> > unless a dirty cache line is referenced, memory bandwidth is not consumed.
>
> At the current rate of cache flushes (way way too high), the main factor wrt
> performance is indeed the execution of the instructions.
>
> As soon as we've implemented page colouring (which turns out to be really very
> close to large page support), I expect us to get along with very very few
> cache flushes - basically I hope we can avoid them completely.
>
> Philipp Rumpf
No such luck. You'll need cache flushing in drivers for non-coherent IO.
-Frank
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: Cache Flushes
1999-11-24 22:38 ` Frank Rowand
@ 1999-11-24 23:24 ` Grant Grundler
0 siblings, 0 replies; 50+ messages in thread
From: Grant Grundler @ 1999-11-24 23:24 UTC (permalink / raw)
To: parisc-linux
Frank Rowand wrote:
> Philipp Rumpf wrote:
...
> > As soon as we've implemented page colouring (which turns out to be really v
> ery
> > close to large page support), I expect us to get along with very very few
> > cache flushes - basically I hope we can avoid them completely.
> >
> > Philipp Rumpf
>
> No such luck. You'll need cache flushing in drivers for non-coherent IO.
Frank is (as usual) right. Rule of thumb is if the box doesn't have an
I/O MMU (aka ccio or sba) or not using it, then it's not I/O coherent.
Every DMA transaction will require flushes/purges before or after
(inbound vs. outbound) of payload and device control data on such boxes.
My understanding is only PA2.0 supports speculative prefetching.
AFIAK that's ok since all PA2.0 boxes have an I/O MMU and the prefetched
data will be recalled/dropped during the course of the DMA.
Conclusion: In the "performance code path", PA1.1 will generally (some
PA1.1 are also I/O coherent) need flushes/purges and PA2.0 won't.
However, I wouldn't code this based on PA2.0 vs PA1.1 since the presence
of U2/Uturn/sba are what matter. And we will probably want to ignore
those chips for PA2.0 bringup (ie bringup will be not be I/O coherent).
grant
Grant Grundler
Unix Developement Lab
+1.408.447.7253
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 20:55 ` [parisc-linux] Progress - Update John David Anglin
1999-11-24 21:05 ` Philipp Rumpf
@ 1999-11-25 0:31 ` John David Anglin
1999-11-25 1:17 ` Alan Cox
1999-11-26 13:40 ` Matthew Wilcox
2 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-25 0:31 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp.H.Rumpf, sieler, parisc-linux
> I got a nice dump of the registers. I think this shows that interruptions
> for bus errors are working. The panic occured at sem_exit+2C when it
> tried to access location fffc5e6c. I will try to determine how this address
> got used by sem_exit.
Anybody know where or how the pointer "current" is defined? It is not
in the map. I think it is supposed to point current processes task_struct.
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-25 0:31 ` [parisc-linux] Progress - Update John David Anglin
@ 1999-11-25 1:17 ` Alan Cox
1999-11-25 13:24 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 1999-11-25 1:17 UTC (permalink / raw)
To: John David Anglin; +Cc: dave, Philipp.H.Rumpf, sieler, parisc-linux
> Anybody know where or how the pointer "current" is defined? It is not
> in the map. I think it is supposed to point current processes task_struct.
current is a macro getting the task struct by manipulating %esp
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-25 1:17 ` Alan Cox
@ 1999-11-25 13:24 ` Philipp Rumpf
1999-11-25 23:47 ` John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-25 13:24 UTC (permalink / raw)
To: Alan Cox; +Cc: John David Anglin, Philipp Heinrich Rumpf, sieler, parisc-linux
> > Anybody know where or how the pointer "current" is defined? It is not
> > in the map. I think it is supposed to point current processes task_struct.
include/asm-parisc/current.h
static inline struct task_struct * get_current(void)
{
struct task_struct *current;
asm("copy 30,%0" : "=r" (current));
return (struct task_struct *)((long) current & (~8191));
}
#define current get_current()
> current is a macro getting the task struct by manipulating %esp
GR30 in our case.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-25 13:24 ` Philipp Rumpf
@ 1999-11-25 23:47 ` John David Anglin
1999-11-30 18:17 ` John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-25 23:47 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: alan, Philipp.H.Rumpf, sieler, parisc-linux
> GR30 in our case.
GR30 (sp) appears ok. It had a value of 0xc019e6c0 when the panic
in sem_exit occured. The panic occured in this code:
if ((q = current->semsleeping)) {
if (q->prev)
The value for q (GR20) was 0xfffc5e6c. Thus, the initialization of
the process task structure needs to be looked at.
I built another kernel without IPC. In this one, the kernel panic'd
in get_unused_buffer_head. This one occured because unused_list was
zero. Again, it looks like there is an initialization problem somewhere.
I think others have reported this problem in the past. It looks like:
bad address 0000001c (code 15)
Kernel panic: bad address
Here is the code:
static struct buffer_head * get_unused_buffer_head(int async)
{
struct buffer_head * bh;
recover_reusable_buffer_heads();
if (nr_unused_buffer_heads > NR_RESERVED) {
bh = unused_list;
unused_list = bh->b_next_free;
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-24 20:55 ` [parisc-linux] Progress - Update John David Anglin
1999-11-24 21:05 ` Philipp Rumpf
1999-11-25 0:31 ` [parisc-linux] Progress - Update John David Anglin
@ 1999-11-26 13:40 ` Matthew Wilcox
2 siblings, 0 replies; 50+ messages in thread
From: Matthew Wilcox @ 1999-11-26 13:40 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Rumpf, sieler, parisc-linux
On Wed, Nov 24, 1999 at 03:55:38PM -0500, John David Anglin wrote:
> The default command line now prints during the boot. However, the system
> is still not able to open an initial console.
>
> I removed all LASI stuff from my configuration since some of the boot
> messages made it seem that it was trying to do things that were bad. It
> did find the ASP, however, when the LASI stuff was in.
You need to enable LASI support to get ASP support. I'll start working
on this again as soon as I can (which may not be very soon given my
imminent change of job and continent). Maybe I'll get a chance at
LinuxTag in Bremen.
--
Matthew Wilcox <willy@bofh.ai>
"Windows and MacOS are products, contrived by engineers in the service of
specific companies. Unix, by contrast, is not so much a product as it is a
painstakingly compiled oral history of the hacker subculture." - N Stephenson
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-25 23:47 ` John David Anglin
@ 1999-11-30 18:17 ` John David Anglin
1999-11-30 18:21 ` Alan Cox
1999-11-30 18:32 ` Philipp Rumpf
0 siblings, 2 replies; 50+ messages in thread
From: John David Anglin @ 1999-11-30 18:17 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp.H.Rumpf, alan, sieler, parisc-linux
> I built another kernel without IPC. In this one, the kernel panic'd
> in get_unused_buffer_head. This one occured because unused_list was
> zero. Again, it looks like there is an initialization problem somewhere.
> I think others have reported this problem in the past. It looks like:
>
> bad address 0000001c (code 15)
> Kernel panic: bad address
I have found the cause of this panic. The problem occurs when xchg
is called with a pointer which is not aligned on a 16 byte boundary.
The ldcws semaphore instruction is only defined when the address is
aligned on a 16 byte boundary. It is used when the value to exchange
is 0.
The macro xchg is used in quite a few places in what is nominally
machine independent code. Should we add __attribute__ ((aligned (16))
to the definitions of all variables which exchange with 0? Or, should
we look for another solution?
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 18:17 ` John David Anglin
@ 1999-11-30 18:21 ` Alan Cox
1999-11-30 18:32 ` Philipp Rumpf
1 sibling, 0 replies; 50+ messages in thread
From: Alan Cox @ 1999-11-30 18:21 UTC (permalink / raw)
To: John David Anglin; +Cc: dave, Philipp.H.Rumpf, alan, sieler, parisc-linux
> The macro xchg is used in quite a few places in what is nominally
> machine independent code. Should we add __attribute__ ((aligned (16))
> to the definitions of all variables which exchange with 0? Or, should
> we look for another solution?
xchg() is defined to work for a range of types and sizes. It is up to the
arch code to implement it. That may mean you need to use a spinlock to implementsome cases.
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 18:17 ` John David Anglin
1999-11-30 18:21 ` Alan Cox
@ 1999-11-30 18:32 ` Philipp Rumpf
1999-11-30 19:31 ` Alan Cox
1 sibling, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-11-30 18:32 UTC (permalink / raw)
To: John David Anglin; +Cc: Philipp Heinrich Rumpf, alan, sieler, parisc-linux
> I have found the cause of this panic. The problem occurs when xchg
> is called with a pointer which is not aligned on a 16 byte boundary.
> The ldcws semaphore instruction is only defined when the address is
> aligned on a 16 byte boundary. It is used when the value to exchange
> is 0.
> The macro xchg is used in quite a few places in what is nominally
> machine independent code. Should we add __attribute__ ((aligned (16))
> to the definitions of all variables which exchange with 0? Or, should
> we look for another solution?
The problem with all this is xchg needs to be atomic, so there is no easy
and obviously correct way to do it. Probably the most feasible thing is
to use spin_lock_irqsave / spin_unlock_irqrestore on an array of spinlocks
indexed by a hash of the address - x86istic code expecting xchg() to be
fast is going to suck I'd guess.
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 18:32 ` Philipp Rumpf
@ 1999-11-30 19:31 ` Alan Cox
1999-11-30 20:14 ` Mark Klein
0 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 1999-11-30 19:31 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: dave, Philipp.H.Rumpf, alan, sieler, parisc-linux
> to use spin_lock_irqsave / spin_unlock_irqrestore on an array of spinlocks
> indexed by a hash of the address - x86istic code expecting xchg() to be
> fast is going to suck I'd guess.
You can catch the suitably aligned case. I don't think anyone will have a problem
if you submit patches over time so that critical xchg()'d objects are 16byte
aligned where it doesnt impact other platforms.
Keep counters of aligned/unaligned
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 19:31 ` Alan Cox
@ 1999-11-30 20:14 ` Mark Klein
1999-11-30 23:40 ` John David Anglin
0 siblings, 1 reply; 50+ messages in thread
From: Mark Klein @ 1999-11-30 20:14 UTC (permalink / raw)
To: Alan Cox, Philipp Rumpf; +Cc: dave, Philipp.H.Rumpf, alan, sieler, parisc-linux
At 07:31 PM 11/30/99 +0000, Alan Cox wrote:
> > to use spin_lock_irqsave / spin_unlock_irqrestore on an array of spinlocks
> > indexed by a hash of the address - x86istic code expecting xchg() to be
> > fast is going to suck I'd guess.
>
>You can catch the suitably aligned case. I don't think anyone will have a
>problem
>if you submit patches over time so that critical xchg()'d objects are 16byte
>aligned where it doesnt impact other platforms.
You also have another option:
Create the xchg more as a "compare and swap" and separate the semaphore(s) from
the data. You can always keep the semaphores paragraph aligned and not worry
about where the data falls. Here's a sample of something I've used successfully
for many years:
;
; function CompareSwap(var Semaphore : integer;
; CellAddress : localanyptr;
; OldValue : integer;
; NewValue : integer) : integer;
;
;
; Local Register Use Declarations
;
Semaphore .EQU arg0
CellAddress .EQU arg1
OldValue .EQU arg2
NewValue .EQU arg3
;
; Pause Time Declaration
;
PauseTime .EQU 0x3f4cc800
compareswap .proc
.export compareswap, entry
.import PAUSE,code
ldw 0(0,CellAddress),r20 ; dummy load to avoid page fault
ldcws 0(0,Semaphore),ret0 ; load & clear shared memory
ldw 0(0,CellAddress),r20 ; load contents of Cell into reg20
comibf,=,n 1,ret0,spin ; if sem. locked, go to sleep
combf,=,n r20,OldValue,noteq ; go to noteq if Cell <> OldValue
stw NewValue,0(0,CellAddress) ; save new value into cell
bv r0(rp) ; return from whence we came
stw ret0,0(0,Semaphore) ; reset semaphore to avail
noteq ; the case of Cell <> OldValue
stw ret0,0(0,Semaphore) ; reset semaphore to avail
bv r0(rp) ; return from whence we came
or r0,r0,ret0 ; return false
spin
ldi -50,r19 ; Initialize counter
ldw 0(0,Semaphore),r31 ; Prime the pump
spinagain
comibt,= 1,r31,compareswap ; If so, do it again
ldw 0(0,Semaphore),r31 ; Load Semaphore again
addibt,<=,n 1,r19,spinagain ; Try again if needed.
nop
wait
stw rp,-20(sp) ; .ENTER
ldo 56(sp),sp ; .ENTER
stw arg0, -92(0,sp) ; save var Semaphore
stw arg1, -96(0,sp) ; save CellAddress
stw arg2,-100(0,sp) ; save OldValue
stw arg3,-104(0,sp) ; save NewValue
; prepare to call PAUSE
ldil L'PauseTime,r1 ; build sleep time value
ldo R'PauseTime(r1),r31
stw r31,-52(0,sp); ; save it
ldo -52(sp),arg0 ; fetch address of time value
ldil L'PAUSE,r31 ; construct link to PAUSE
.CALL
ble R'PAUSE(sr4,r31); ; sleep
COPY r31,r2 ; set return address
ldw -92(0,sp),arg0 ; restore var Semaphore
ldw -96(0,sp),arg1 ; restore CellAddress
ldw -100(0,sp),arg2 ; restore OldValue
ldw -104(0,sp),arg3 ; restore NewValue
ldw -76(sp),rp ; restore return pointer
b compareswap ; try lock the semaphore again
ldo -56(sp),sp ; pop stack ptr back to previous one
nop
nop
nop
.callinfo caller, save_rp
.exit
.procend
--
Mark Klein DIS International, Ltd.
http://www.dis.com 415-892-8400
PGP Public Key Available
--
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 20:14 ` Mark Klein
@ 1999-11-30 23:40 ` John David Anglin
1999-12-01 15:41 ` Philipp Rumpf
0 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 1999-11-30 23:40 UTC (permalink / raw)
To: Mark Klein; +Cc: alan, Philipp.H.Rumpf, sieler, parisc-linux
>
> At 07:31 PM 11/30/99 +0000, Alan Cox wrote:
>
>
> > > to use spin_lock_irqsave / spin_unlock_irqrestore on an array of spinlocks
> > > indexed by a hash of the address - x86istic code expecting xchg() to be
> > > fast is going to suck I'd guess.
> >
> >You can catch the suitably aligned case. I don't think anyone will have a
> >problem
> >if you submit patches over time so that critical xchg()'d objects are 16byte
> >aligned where it doesnt impact other platforms.
>
> You also have another option:
>
> Create the xchg more as a "compare and swap" and separate the semaphore(s) from
> the data. You can always keep the semaphores paragraph aligned and not worry
> about where the data falls. Here's a sample of something I've used successfully
> for many years:
[deleted]
Since the exchange operation is not allowed to fail, using an exchange
semaphore to guarantee that the operation is atomic could result in deadlock
if xchg is called from an interrupt service routine. Thus, it appears
that we are stuck with using spin_lock_irqsave / spin_unlock_irqrestore.
Exchange with 0 can be done with ldcws if the pointer is aligned on
a 16 bye boundary. For the present, it is probably best to just use
spin_lock_irqsave / spin_unlock_irqrestore.
Currently, the lock variable used in xchg is static and ignored by
spin_lock_irqsave / spin_unlock_irqrestore. This will have to change
for SMP.
Dave
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-11-30 23:40 ` John David Anglin
@ 1999-12-01 15:41 ` Philipp Rumpf
1999-12-01 18:03 ` Alan Cox
0 siblings, 1 reply; 50+ messages in thread
From: Philipp Rumpf @ 1999-12-01 15:41 UTC (permalink / raw)
To: John David Anglin
Cc: Mark Klein, alan, Philipp Heinrich Rumpf, sieler, parisc-linux
> [deleted]
>
> Since the exchange operation is not allowed to fail, using an exchange
> semaphore to guarantee that the operation is atomic could result in deadlock
> if xchg is called from an interrupt service routine. Thus, it appears
> that we are stuck with using spin_lock_irqsave / spin_unlock_irqrestore.
> Exchange with 0 can be done with ldcws if the pointer is aligned on
> a 16 bye boundary. For the present, it is probably best to just use
> spin_lock_irqsave / spin_unlock_irqrestore.
>
> Currently, the lock variable used in xchg is static and ignored by
> spin_lock_irqsave / spin_unlock_irqrestore. This will have to change
> for SMP.
We don't have SMP versions of spin_lock/spin_unlock yet. (And it doesn't make
sense to write them for 2.2 just to rewrite them for 2.3 lateron).
Furthermore we have to care for cases like:
var = 1;
x=0;
CPU0 CPU1
xchg(&var,x); test_and_set_bit(0, &var);
Which means we have to always use the same spinlock for all atomic operations[1]
done to an integer - so either a global lock or an array of spinlocks (if the
one global lock strategy shows this spinlock as a major contention point).
IMHO we really should do the 2.3 merge first.
Philipp Rumpf
[1] - spinlock_t, rwlock_t and atomic_t are special as no-one is going to do
any operations but the ones in spinlock.h / atomic.h on those.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 15:41 ` Philipp Rumpf
@ 1999-12-01 18:03 ` Alan Cox
1999-12-01 18:29 ` Alex deVries
1999-12-01 18:33 ` John David Anglin
0 siblings, 2 replies; 50+ messages in thread
From: Alan Cox @ 1999-12-01 18:03 UTC (permalink / raw)
To: Philipp Rumpf; +Cc: dave, mklein, alan, Philipp.H.Rumpf, sieler, parisc-linux
>
> Furthermore we have to care for cases like:
>
> var = 1;
> x=0;
> CPU0 CPU1
> xchg(&var,x); test_and_set_bit(0, &var);
Umm. Does anyone actually rely on that ? Im not sure the non x86 ports have
that property
> IMHO we really should do the 2.3 merge first.
Yep
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:03 ` Alan Cox
@ 1999-12-01 18:29 ` Alex deVries
1999-12-01 18:34 ` Alan Cox
1999-12-02 2:51 ` Philipp Rumpf
1999-12-01 18:33 ` John David Anglin
1 sibling, 2 replies; 50+ messages in thread
From: Alex deVries @ 1999-12-01 18:29 UTC (permalink / raw)
To: Alan Cox; +Cc: Philipp Rumpf, dave, mklein, sieler, parisc-linux
Alan Cox wrote:
> > IMHO we really should do the 2.3 merge first.
> Yep
So when do people think we should do that merge?
- Alex
--
Alex deVries
Vice President of Engineering
The Puffin Group
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:03 ` Alan Cox
1999-12-01 18:29 ` Alex deVries
@ 1999-12-01 18:33 ` John David Anglin
1999-12-01 18:54 ` Alan Cox
1999-12-01 18:55 ` Grant Grundler
1 sibling, 2 replies; 50+ messages in thread
From: John David Anglin @ 1999-12-01 18:33 UTC (permalink / raw)
To: Alan Cox; +Cc: Philipp.H.Rumpf, mklein, alan, sieler, parisc-linux
>
> >
> > Furthermore we have to care for cases like:
> >
> > var = 1;
> > x=0;
> > CPU0 CPU1
> > xchg(&var,x); test_and_set_bit(0, &var);
>
> Umm. Does anyone actually rely on that ? Im not sure the non x86 ports have
> that property
Also, we may need to flush the cache line for &var. The coding in buffer.c
which caused the original panic in xchg looks dubiousin this regoard:
/* Update the reuse list */
tail->b_next_free = xchg(&reuse_list, NULL);
reuse_list = bh;
--
J. David Anglin dave.anglin@nrc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:29 ` Alex deVries
@ 1999-12-01 18:34 ` Alan Cox
1999-12-02 2:51 ` Philipp Rumpf
1 sibling, 0 replies; 50+ messages in thread
From: Alan Cox @ 1999-12-01 18:34 UTC (permalink / raw)
To: Alex deVries; +Cc: alan, Philipp.H.Rumpf, dave, mklein, sieler, parisc-linux
> Alan Cox wrote:
> > > IMHO we really should do the 2.3 merge first.
> > Yep
>
> So when do people think we should do that merge?
2.3.29 is probably a good candidate for this. Its reasonably solid and has
all the relevant stuff that we need to update to handle. 2.3.30pre looks a bit
shaky right now and chasing pre releases wouldnt be fun
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:33 ` John David Anglin
@ 1999-12-01 18:54 ` Alan Cox
1999-12-01 18:55 ` Grant Grundler
1 sibling, 0 replies; 50+ messages in thread
From: Alan Cox @ 1999-12-01 18:54 UTC (permalink / raw)
To: John David Anglin; +Cc: alan, Philipp.H.Rumpf, mklein, sieler, parisc-linux
> Also, we may need to flush the cache line for &var. The coding in buffer.c
> which caused the original panic in xchg looks dubiousin this regoard:
>
> /* Update the reuse list */
> tail->b_next_free = xchg(&reuse_list, NULL);
> reuse_list = bh;
spin lock and atomic ops are expected to be write barriers across all
CPUs
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:33 ` John David Anglin
1999-12-01 18:54 ` Alan Cox
@ 1999-12-01 18:55 ` Grant Grundler
1 sibling, 0 replies; 50+ messages in thread
From: Grant Grundler @ 1999-12-01 18:55 UTC (permalink / raw)
To: John David Anglin; +Cc: Alan Cox, Philipp.H.Rumpf, mklein, sieler, parisc-linux
"John David Anglin" wrote:
...
> Also, we may need to flush the cache line for &var. The coding in buffer.c
> which caused the original panic in xchg looks dubiousin this regoard:
>
> /* Update the reuse list */
> tail->b_next_free = xchg(&reuse_list, NULL);
> reuse_list = bh;
If this data struct is only touched by processors, they are coherent
and flushing is not needed. (Except VM aliases...but I have to ignore
this case due to my ignorance). If any driver DMAs this data and the
platform is I/O coherent (ie has ccio, epic or sba driver), processor
doesn't need to flush either. Processor should only need to flush on
*any* (SMP or not) platform which is not I/O coherent (assume the page
is mapped cacheable).
grant
Grant Grundler
Unix Developement Lab
+1.408.447.7253
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [parisc-linux] Progress - Update
1999-12-01 18:29 ` Alex deVries
1999-12-01 18:34 ` Alan Cox
@ 1999-12-02 2:51 ` Philipp Rumpf
1 sibling, 0 replies; 50+ messages in thread
From: Philipp Rumpf @ 1999-12-02 2:51 UTC (permalink / raw)
To: Alex deVries; +Cc: Alan Cox, Philipp Rumpf, dave, mklein, sieler, parisc-linux
> So when do people think we should do that merge?
I actually just got 2.3.29 to build (not work, but I'm sure it shouldn't take too
long to do that) with the parisc changes hacked in.
I think within the next few days I should have a kernel tree that works as much
as our current tree does, but is based on 2.3.
The other issue is when we want the CVS tree to be moved to 2.3. This mainly
depends on whether people have large uncommitted patch sets that they would
like to finish / commit as long as they at least have an idea how the kernel
behaves without the patch.
So, anyone ?
Philipp Rumpf
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~1999-12-02 2:48 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-11-18 23:16 [parisc-linux] Progress John David Anglin
1999-11-19 3:30 ` Philipp Rumpf
1999-11-21 23:07 ` John David Anglin
1999-11-22 1:12 ` John David Anglin
1999-11-22 8:24 ` Philipp Rumpf
1999-11-22 9:59 ` Alan Cox
1999-11-22 15:54 ` Philipp Rumpf
1999-11-22 7:30 ` Philipp Rumpf
1999-11-22 18:11 ` John David Anglin
1999-11-22 18:27 ` Stan Sieler
1999-11-22 18:42 ` Philipp Rumpf
1999-11-22 18:33 ` Philipp Rumpf
1999-11-22 20:55 ` John David Anglin
1999-11-23 11:47 ` Philipp Rumpf
1999-11-23 16:19 ` John David Anglin
1999-11-23 18:03 ` Philipp Rumpf
1999-11-23 19:01 ` John David Anglin
1999-11-23 21:11 ` Stan Sieler
1999-11-24 10:00 ` Philipp Rumpf
1999-11-24 19:00 ` Stan Sieler
1999-11-24 19:33 ` Philipp Rumpf
1999-11-24 20:55 ` [parisc-linux] Progress - Update John David Anglin
1999-11-24 21:05 ` Philipp Rumpf
1999-11-24 21:27 ` Kirk Bresniker
1999-11-24 21:37 ` Philipp Rumpf
1999-11-24 22:38 ` Frank Rowand
1999-11-24 23:24 ` Cache Flushes Grant Grundler
1999-11-25 0:31 ` [parisc-linux] Progress - Update John David Anglin
1999-11-25 1:17 ` Alan Cox
1999-11-25 13:24 ` Philipp Rumpf
1999-11-25 23:47 ` John David Anglin
1999-11-30 18:17 ` John David Anglin
1999-11-30 18:21 ` Alan Cox
1999-11-30 18:32 ` Philipp Rumpf
1999-11-30 19:31 ` Alan Cox
1999-11-30 20:14 ` Mark Klein
1999-11-30 23:40 ` John David Anglin
1999-12-01 15:41 ` Philipp Rumpf
1999-12-01 18:03 ` Alan Cox
1999-12-01 18:29 ` Alex deVries
1999-12-01 18:34 ` Alan Cox
1999-12-02 2:51 ` Philipp Rumpf
1999-12-01 18:33 ` John David Anglin
1999-12-01 18:54 ` Alan Cox
1999-12-01 18:55 ` Grant Grundler
1999-11-26 13:40 ` Matthew Wilcox
1999-11-23 15:59 ` [parisc-linux] Progress Paul Bame
1999-11-23 16:33 ` John David Anglin
1999-11-23 17:05 ` Paul Bame
1999-11-23 20:15 ` John David Anglin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.