* Random crashes
@ 2003-08-24 7:24 Giuliano Pochini
2003-08-24 8:46 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 10+ messages in thread
From: Giuliano Pochini @ 2003-08-24 7:24 UTC (permalink / raw)
To: LinuxPPC-dev
I'm having random crashes on a dualG4 windtunnel and v2.4.22p6. Is there
any known issue ?
--
Bye.
Giuliano.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-24 7:24 Random crashes Giuliano Pochini
@ 2003-08-24 8:46 ` Benjamin Herrenschmidt
2003-08-24 15:44 ` Giuliano Pochini
0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2003-08-24 8:46 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: LinuxPPC-dev
On Sun, 2003-08-24 at 09:24, Giuliano Pochini wrote:
> I'm having random crashes on a dualG4 windtunnel and v2.4.22p6. Is there
> any known issue ?
What do you mean by random crash ? Lockups ? Panics ? See anything in
console or logs ? Is this new to this kernel ?
Ben.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-24 8:46 ` Benjamin Herrenschmidt
@ 2003-08-24 15:44 ` Giuliano Pochini
2003-08-24 15:50 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 10+ messages in thread
From: Giuliano Pochini @ 2003-08-24 15:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Sun, 24 Aug 2003 10:46:29 +0200
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > I'm having random crashes on a dualG4 windtunnel and v2.4.22p6. Is there
> > any known issue ?
>
> What do you mean by random crash ? Lockups ? Panics ? See anything in
> console or logs ? Is this new to this kernel ?
Random... lockups, oopses, sig11... but nothing useful because is happens
in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
the only kernel I used on this mac.
--
Bye.
Giuliano.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-24 15:44 ` Giuliano Pochini
@ 2003-08-24 15:50 ` Benjamin Herrenschmidt
2003-08-27 20:06 ` Giuliano Pochini
0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2003-08-24 15:50 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: linuxppc-dev
On Sun, 2003-08-24 at 17:44, Giuliano Pochini wrote:
> On Sun, 24 Aug 2003 10:46:29 +0200
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > > I'm having random crashes on a dualG4 windtunnel and v2.4.22p6. Is there
> > > any known issue ?
> >
> > What do you mean by random crash ? Lockups ? Panics ? See anything in
> > console or logs ? Is this new to this kernel ?
>
> Random... lockups, oopses, sig11... but nothing useful because is happens
> in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
> the only kernel I used on this mac.
Strange. I haven't been reported such problems. Can you try an older kernel
just in case ? Could also be bad ram...
Ben.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-24 15:50 ` Benjamin Herrenschmidt
@ 2003-08-27 20:06 ` Giuliano Pochini
2003-08-28 8:31 ` Benjamin Herrenschmidt
2003-08-28 17:02 ` linas
0 siblings, 2 replies; 10+ messages in thread
From: Giuliano Pochini @ 2003-08-27 20:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Sun, 24 Aug 2003 17:50:17 +0200
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > Random... lockups, oopses, sig11... but nothing useful because is happens
> > in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
> > the only kernel I used on this mac.
>
> Strange. I haven't been reported such problems. Can you try an older kernel
> just in case ? Could also be bad ram...
I tried 2.4.22 and I replaced the RAM. Nothing. Digging in the oops
collection I found this one which doesn't look very nice:
Jul 23 21:37:55 localhost kernel: Machine check in kernel mode.
Jul 23 21:37:55 localhost kernel: Caused by (from SRR1=20009030): L1 Data Cache error
I'll send the machine back for repair, altought I think they'll not even
notice the problem because it happens sporadically :(((
--
Bye.
Giuliano.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-27 20:06 ` Giuliano Pochini
@ 2003-08-28 8:31 ` Benjamin Herrenschmidt
2003-08-28 13:25 ` Giuliano Pochini
2003-08-28 17:02 ` linas
1 sibling, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2003-08-28 8:31 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: linuxppc-dev
On Wed, 2003-08-27 at 22:06, Giuliano Pochini wrote:
> On Sun, 24 Aug 2003 17:50:17 +0200
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > > Random... lockups, oopses, sig11... but nothing useful because is happens
> > > in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
> > > the only kernel I used on this mac.
> >
> > Strange. I haven't been reported such problems. Can you try an older kernel
> > just in case ? Could also be bad ram...
>
> I tried 2.4.22 and I replaced the RAM. Nothing. Digging in the oops
> collection I found this one which doesn't look very nice:
>
> Jul 23 21:37:55 localhost kernel: Machine check in kernel mode.
> Jul 23 21:37:55 localhost kernel: Caused by (from SRR1=20009030): L1 Data Cache error
>
> I'll send the machine back for repair, altought I think they'll not even
> notice the problem because it happens sporadically :(((
Well... I'm not 100% sure the message is correct, though from what you say,
it seems indeed there is a CPU fault...
What CPU is this exactly ? (/proc/cpuinfo)
Ben.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-28 8:31 ` Benjamin Herrenschmidt
@ 2003-08-28 13:25 ` Giuliano Pochini
2003-08-28 13:47 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 10+ messages in thread
From: Giuliano Pochini @ 2003-08-28 13:25 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On 28-Aug-2003 Benjamin Herrenschmidt wrote:
>> > Strange. I haven't been reported such problems. Can you try an older kernel
>> > just in case ? Could also be bad ram...
>>
>> I tried 2.4.22 and I replaced the RAM. Nothing. Digging in the oops
>> collection I found this one which doesn't look very nice:
>>
>> Jul 23 21:37:55 localhost kernel: Machine check in kernel mode.
>> Jul 23 21:37:55 localhost kernel: Caused by (from SRR1=20009030): L1 Data Cache error
>>
>> I'll send the machine back for repair, altought I think they'll not even
>> notice the problem because it happens sporadically :(((
>
> Well... I'm not 100% sure the message is correct, though from what you say,
> it seems indeed there is a CPU fault...
Yes, but it happened only once. All the others were "normal" segfaults, in both
userspace and kernel space and hard lockups.
> What CPU is this exactly ? (/proc/cpuinfo)
processor : 0
cpu : 7455, altivec supported
clock : 1249MHz
revision : 3.3 (pvr 8001 0303)
bogomips : 1248.46
processor : 1
cpu : 7455, altivec supported
clock : 1249MHz
revision : 3.3 (pvr 8001 0303)
bogomips : 1248.46
total bogomips : 2496.92
machine : PowerMac3,6
motherboard : PowerMac3,6 MacRISC3 Power Macintosh
detected as : 129 (PowerMac G4 Windtunnel)
pmac flags : 00000000
L2 cache : 256K unified
memory : 512MB
pmac-generation : NewWorld
I'm reading the latest 7455 errata
http://e-www.motorola.com/files/32bit/doc/errata/MPC7455CE.pdf
but I don't see anything that can cause L1 errors.
Unrelated thing: tlbli instruction can cause problems on 7455 (bug.20).
arch/ppc/kernel/head.S does not use the suggested workaround.
Bye.
Giuliano.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-28 13:25 ` Giuliano Pochini
@ 2003-08-28 13:47 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2003-08-28 13:47 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: linuxppc-dev
On Thu, 2003-08-28 at 15:25, Giuliano Pochini wrote:
> On 28-Aug-2003 Benjamin Herrenschmidt wrote:
> >> > Strange. I haven't been reported such problems. Can you try an older kernel
> >> > just in case ? Could also be bad ram...
> >>
> >> I tried 2.4.22 and I replaced the RAM. Nothing. Digging in the oops
> >> collection I found this one which doesn't look very nice:
> >>
> >> Jul 23 21:37:55 localhost kernel: Machine check in kernel mode.
> >> Jul 23 21:37:55 localhost kernel: Caused by (from SRR1=20009030): L1 Data Cache error
> >>
> >> I'll send the machine back for repair, altought I think they'll not even
> >> notice the problem because it happens sporadically :(((
> >
> > Well... I'm not 100% sure the message is correct, though from what you say,
> > it seems indeed there is a CPU fault...
>
> Yes, but it happened only once. All the others were "normal" segfaults, in both
> userspace and kernel space and hard lockups.
Did you try a few things like running single CPU and not enabling IRQ
distribution on all CPUs ?
> .../...
>
> Unrelated thing: tlbli instruction can cause problems on 7455 (bug.20).
> arch/ppc/kernel/head.S does not use the suggested workaround.
We don't use tlbli on 745x, only on 603s.
Ben.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-27 20:06 ` Giuliano Pochini
2003-08-28 8:31 ` Benjamin Herrenschmidt
@ 2003-08-28 17:02 ` linas
2003-08-29 7:17 ` Giuliano Pochini
1 sibling, 1 reply; 10+ messages in thread
From: linas @ 2003-08-28 17:02 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: Benjamin Herrenschmidt, linuxppc-dev
On Wed, Aug 27, 2003 at 10:06:27PM +0200, Giuliano Pochini wrote:
>
> > > Random... lockups, oopses, sig11... but nothing useful because is happens
> > > in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
> > > the only kernel I used on this mac.
Can you monitor cpu temp somehow? I had this once when a cpu fan would
barely spin.
Slightly off-topic: I've always wanted to have a memory-cache checker
kerneld that ran continuously in the background.
More off-topic. I've always wanted to have an on-line, background fsck
checker running continuously. I've had problems where a journalling
FS would think the FS was fine, log journal clean, but in fact, the fs
had slowly accumulated errors over many months due to ?? faulty
electronics ??. Biting the bullet and spending half a day on fsck
would reveal the problem; but that's an unhappy way to do things.
I guess i've got this jones for systems that run perfectly reliably
perfectly securely all the time ...
--linas
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Random crashes
2003-08-28 17:02 ` linas
@ 2003-08-29 7:17 ` Giuliano Pochini
0 siblings, 0 replies; 10+ messages in thread
From: Giuliano Pochini @ 2003-08-29 7:17 UTC (permalink / raw)
To: linas; +Cc: linuxppc-dev, linuxppc-dev, Benjamin Herrenschmidt
On 28-Aug-2003 linas@austin.ibm.com wrote:
> On Wed, Aug 27, 2003 at 10:06:27PM +0200, Giuliano Pochini wrote:
>>
>> > > Random... lockups, oopses, sig11... but nothing useful because is happens
>> > > in random places. Ok, so the answer is no. Maybe the hw is faulty. It's
>> > > the only kernel I used on this mac.
>
> Can you monitor cpu temp somehow? I had this once when a cpu fan would
> barely spin.
I cannot monitor the tempeature directly because AFAIK that feature is
not available on 7455. The g4fan driver by Samuel Rydh says the temp.
is always around 59C. But I don't think this is the cause because I also
tried to keep the fans at full speed (~50C) with same results.
> Slightly off-topic: I've always wanted to have a memory-cache checker
> kerneld that ran continuously in the background.
>
> More off-topic. I've always wanted to have an on-line, background fsck
> checker running continuously. I've had problems where a journalling
> FS would think the FS was fine, log journal clean, but in fact, the fs
> had slowly accumulated errors over many months due to ?? faulty
> electronics ??.
Small bugs, more likely. All journalling fs are still beta.
Bye.
Giuliano.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-08-29 7:17 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-24 7:24 Random crashes Giuliano Pochini
2003-08-24 8:46 ` Benjamin Herrenschmidt
2003-08-24 15:44 ` Giuliano Pochini
2003-08-24 15:50 ` Benjamin Herrenschmidt
2003-08-27 20:06 ` Giuliano Pochini
2003-08-28 8:31 ` Benjamin Herrenschmidt
2003-08-28 13:25 ` Giuliano Pochini
2003-08-28 13:47 ` Benjamin Herrenschmidt
2003-08-28 17:02 ` linas
2003-08-29 7:17 ` Giuliano Pochini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).