* Re: [parisc-linux] N Class SMP pb ? (follow up)
@ 2003-09-26 15:46 Joel Soete
2003-09-26 16:08 ` Joel Soete
2003-09-26 16:50 ` Grant Grundler
0 siblings, 2 replies; 11+ messages in thread
From: Joel Soete @ 2003-09-26 15:46 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
>Yes - 6 is ITLB miss and 15 is Data TLB miss.
...
>
>> handle_interruption(26, ...).
>
>26 is "Data Memory Access rights Trap".
>This sounds normal for Copy-On-Write.
Yes to be sure I just finished to logon a b2k with same kernel (excepted
pdc support but I already verify it doesn't make any difference in the crash
in smp on the N) and effectively it is normal to read many 6, 15 and 26
interruptions.
>> SMP CALL FUNCTION TIMED OUT (CPU=1)
>
>The IPI handler will time out if the other CPU doesn't ack
>the function call with in a second. This is bad.
OTC This is the better messages I never get to start an analyse of this crash
:))
>It means either other CPU never got the interrupt (locked up
>with I-bit off) or the "unstarted_count" isn't coherent between the CPUs.
hmm how could I verify this hypothesis?
>>
>> Could this be a pb with sync between cpu time ref?
>> (because timeout = jiffies + HZ)
>
>I don't think so since jiffies is a global.
>And it's always be measured on the same CPU.
Ok
>
>> I have also a look for where this function is called but never see its
return
>> code tested to launch a 'stack dump' and a stop of system?
>
>You need to find out who is using smp_call_function() and which function
>they are trying to invoke. I suspect it's coming from mm/slab.c but
>would know which of the three it might be.
Effectively I don't find another place where it is called. And so add a
printk in each function calling smp_call_function_all_cpus() finaly.
That is allowing me to notice severall call to kmem_tune_cpucache() (7 exactly)
(and not other) but don't get any more 'SMP CALL FUNCTION TIMED OUT (CPU=1)'
:(
(i presume that, as previously, the system crash before having the opportunity
to flush its buffer?)
What do you think?
Thanks a lot for help,
Joel
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-26 15:46 [parisc-linux] N Class SMP pb ? (follow up) Joel Soete
@ 2003-09-26 16:08 ` Joel Soete
2003-09-26 16:50 ` Grant Grundler
1 sibling, 0 replies; 11+ messages in thread
From: Joel Soete @ 2003-09-26 16:08 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
>
>That is allowing me to notice severall call to kmem_tune_cpucache() (7 exactly)
>(and not other) but don't get any more 'SMP CALL FUNCTION TIMED OUT (CPU=1)'
>:(
>(i presume that, as previously, the system crash before having the opportun
>ty to flush its buffer?)
btw: does it exists some tips to flush buffer before all (or not buffering
console ouput)?
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-26 15:46 [parisc-linux] N Class SMP pb ? (follow up) Joel Soete
2003-09-26 16:08 ` Joel Soete
@ 2003-09-26 16:50 ` Grant Grundler
2003-09-27 18:16 ` Joel Soete
1 sibling, 1 reply; 11+ messages in thread
From: Grant Grundler @ 2003-09-26 16:50 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Fri, Sep 26, 2003 at 05:46:35PM +0200, Joel Soete wrote:
> >It means either other CPU never got the interrupt (locked up
> >with I-bit off) or the "unstarted_count" isn't coherent between the CPUs.
>
> hmm how could I verify this hypothesis?
TOC the machine, "ser pim" and look at PSW in TOC Info for each CPU.
bit 0 is the I-Bit IIRC.
On second thought, I'm skeptical unstarted_count isn't coherent
since it's a kernel global as well (like jiffies).
> >You need to find out who is using smp_call_function() and which function
> >they are trying to invoke. I suspect it's coming from mm/slab.c but
> >would know which of the three it might be.
>
> Effectively I don't find another place where it is called. And so add a
> printk in each function calling smp_call_function_all_cpus() finaly.
>
> That is allowing me to notice severall call to kmem_tune_cpucache() (7 exactly)
> (and not other) but don't get any more 'SMP CALL FUNCTION TIMED OUT (CPU=1)'
> :(
> (i presume that, as previously, the system crash before having the opportunity
> to flush its buffer?)
>
> What do you think?
Could be.
Add mdelay(100) (or higher) after the lines of output you've added.
The works if it's a functional problem that's not timing dependent.
Otherwise setup kernel crash dump and use tools from bruno/phi to view
contents of the kernel message buffer.
grant
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-26 16:50 ` Grant Grundler
@ 2003-09-27 18:16 ` Joel Soete
0 siblings, 0 replies; 11+ messages in thread
From: Joel Soete @ 2003-09-27 18:16 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Hello Grant,
Grant Grundler wrote:
>On Fri, Sep 26, 2003 at 05:46:35PM +0200, Joel Soete wrote:
>
>
>>>It means either other CPU never got the interrupt (locked up
>>>with I-bit off) or the "unstarted_count" isn't coherent between the CPUs.
>>>
>>>
>>hmm how could I verify this hypothesis?
>>
>>
>
>TOC the machine, "ser pim" and look at PSW in TOC Info for each CPU.
>bit 0 is the I-Bit IIRC.
>
>
Here is such TOC:
PROCESSOR PIM INFORMATION
Original Product Number: A3639C
Current Product Number: A3639C
------- Processor 1 HPMC Information - PDC Version: 41.28^@ ------
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
HPMC Chassis Codes
Chassis Code Extension
------------ ---------
0x0000082000ff6242 0x0000000000000000
0x1800082011016312 0xcb81000000000000
0x0000087000ff6292 0x000000ffff800000
0x6000082013016062 0x2002000000080000
0x6000082013016072 0x0000000000080000
0x7000082013016082 0x0000000000192200
0x6000082013036062 0x2001000000082004
0x6000082013036072 0x0000000000082000
0x7000082013036082 0x0000000000992600
0x6000082070006062 0x0000000000080000
0x6000082070006072 0x0000000000080000
0x7000082070006082 0x0000000000192200
0x6000082070016062 0x0000000000000800
0x6000082070016072 0x0000000000000800
0x7000082070016082 0x00000000001a4400
0x0000080080006310 0x0000000000000001
0x7000082082006333 0x0000000000b92200
0x7000082082016333 0x0000000000b92200
0x000008008000631f 0x0000000000000000
0x0000082000ff6452 0x0000000000000000
0x0000082000ff6402 0x0000000000000000
0x0000080080006300 0x0000000000000001
0x7000082082006333 0x0000000000b92200
0x7000082382006343 0x0000000000070200
0x7000082382016343 0x0000000000070200
0x7000082382026343 0x0000000000070200
0x7000082382046343 0x0000000000070200
0x7000082382056343 0x0000000000070200
0x7000082382086343 0x0000000000070200
0x70000823820a6343 0x0000000000070200
0x70000823820c6343 0x0000000000070200
0x7000082082016333 0x0000000000b92200
0x7000082382106343 0x0000000000070200
0x7000082382126343 0x0000000000070200
0x7000082382146343 0x0000000000070200
0x7000082382186343 0x0000000000070200
0x70000823821a6343 0x0000000000070200
0x70000823821c6343 0x0000000000070200
0x0000080089006200 0x0000000000000000
0x0000082389006200 0x0000000000000000
0x0000080086006200 0x0000000000000000
0x000008008000630f 0x0000000000000000
General Registers 0 - 31
00-03 0000000000000000 00000000104f6380 000000001014acb4
00000000104f3b80
04-07 000000008f029000 0000000010423688 000000008f0b8000
0000000010000000
08-11 0000000013484f70 0000000013481e48 000000007f0b8b25
000000001054ebc0
12-15 00000000000e1984 000000001054ec20 000000008f0a40c0
000000008f0bf708
16-19 0000000013481e48 0000000000000000 00000000faf005e0
0000000000000580
20-23 000000001054ebc0 00000000002f7465 00000000003f45a2
000fe051ffc07eb8
24-27 000000007f029b27 00000000000e1984 000000008f0a40c0
00000000104f3b80
28-31 000000000007f029 003f81480007f029 000000008f0e4f40
0000000000008ba3
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000016 0000000000000000 00000000000000c0
000000000000002b
12-15 0000000000000000 0000000000000000 0000000000107000
ffe0000000000000
16-19 00000024643cebe8 0000000000000000 000000001014acec
0000000037dd3f61
20-23 0000000000000600 00000000000e1984 000000ff0804c70f
c000000000000000
24-27 0000000000427000 000000007f04b000 0000000000041020
000000ffff95c810
28-31 5555555555555555 5555555555555555 000000008f0e4000
0000000010560000
Space Registers 0 - 7
00-03 00000580 00000580 00000000 00000580
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x000000001014acf0
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0010c03b
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0xfffffffffed25000
Floating Point Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 000000001050eec0 00000000104f3b80 0000000000000002
000000001049d248
08-11 00000000104f3b80 0000000000000802 00000000104be588
000000008fac8000
12-15 0000000000000000 0000000000000000 000000001016ace8
00000000103ad6e0
16-19 00000000000009ca 000000008f7cb000 000000000800000f
000000001049d250
20-23 000000001050eec0 00000000104f3b80 00000000003f45a2
000000000000ba2e
24-27 0000999900000000 000099997fac8b70 000000007fac8b78
000000000bebc200
28-31 0000000000000001 00000000ff915e20 0000000010165bf4
00000000104f3b80
Check Summary = 0xcb81000000000000
Available Memory = 0x0000000100000000
CPU Diagnose Register 2 = 0x0301010800802004
CPU Status Register 0 = 0x2640c24000000000
CPU Status Register 1 = 0x8000200000000000
SADD LOG = 0xf8efdb00003fd800
Read Short LOG = 0xc18200ff80000002
----------------- DEW 1 HPMC Information - ------
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
Runway Control Log Reg = 0x00927b0000000000
Runway Address Data Log Reg Odd = 0xc0aa1010c4a61010
Runway Address Data Log Reg Even = 0xc8a61010cca61010
Runway Address Log Reg = 0x00000000000000f4
Runway Broad Error Log Reg = 0x000000000000005c
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_ERROR X X
Merced Bus Requestor Address = 0x0000000000000000
Merced Bus Target Address = 0x0000000000000000
Merced Bus Responder Address = 0x0000000000000000
Merced Error Status Reg = 0x2002000000080000
Merced Error Overflow Reg = 0x0000000000080000
Merced AERR Addr1 Log Reg = 0x00006000ff86fdc0
Merced AERR Addr2 Log Reg = 0x00008000078fff08
Merced DERR Log Reg = 0x0001000000000000
Merced Error Syndrome Reg = 0x00000000000000c0
------- Processor 1^@ LPMC Information ------------------
Check Type = 0x00000000
IC Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
------- Processor 1^@ TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000
0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000
0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000
0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000
0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000
0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000
0000000000000000
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000
0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000
0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000
0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000
0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000
0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000
0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000000000000
CPU State = 0x00000000
------- Processor 3 HPMC Information - PDC Version: 41.28^@ ------
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
HPMC Chassis Codes
Chassis Code Extension
------------ ---------
0x0000082000ff6242 0x0000000000000000
0x1800082011036322 0xcb81800000000000
0x0000082000ff6452 0x0000000000000000
0x0000082000ff6402 0x0000000000000000
General Registers 0 - 31
00-03 0000000000000000 0000000010502b80 00000000101161cc
00000000103ef0f8
04-07 000000000800000f 0000000000000002 0000000000000000
00000000104f3b80
08-11 00000000103ef0f8 00000000103ef0f8 000000001038c43c
000000001038af08
12-15 0000000000000001 0000000000000001 0000000000000000
000000001038e004
16-19 000000001038e018 000000008f7cc180 0000000000000002
0000000000000001
20-23 000000000000702c 0000000010423078 00000000104f4380
0000000000000001
24-27 0000000000000116 000000001038c43c 00000000103ef130
00000000104f3b80
28-31 0000000000000000 000000008f0353b0 000000008f0353c0
0000000000008ba3
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000018 0000000000000000 00000000000000c0
000000000000003d
12-15 0000000000000000 0000000000000000 0000000000107000
ffe0000000000000
16-19 000000246412e91b 0000000000000000 00000000101162d0
000000008e605e8d
20-23 0000000000000600 0000000000000000 000000000806060f
0000000000000000
24-27 0000000000427000 000000007f03e000 0000000000041020
000000ffff95c810
28-31 000000ffff95c810 5555555555555555 000000008f034000
0000000000008020
Space Registers 0 - 7
00-03 00000600 00000000 00000000 00000600
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x00000000101162d4
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0030000d
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0xfffffffffed2d000
System Requestor Address = 0x000000fffed2c000
Floating Point Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 000000001050eec0 00000000104f3b80 0000000000000002
000000001049d248
08-11 00000000104f3b80 0000000000000802 00000000104be588
000000008fac8000
12-15 0000000000000000 0000000000000000 000000001016ace8
00000000103ad6e0
16-19 00000000000009ca 000000008f7cb000 000000000800000f
000000001049d250
20-23 000000001050eec0 00000000104f3b80 0000000000000000
000000000000ba2e
24-27 0000999900000000 000099997fac8b70 000000007fac8b78
000000000bebc200
28-31 0000000000000001 00000000ff915e20 0000000010165bf4
00000000104f3b80
Check Summary = 0xcb81800000000000
Available Memory = 0x0000000100000000
CPU Diagnose Register 2 = 0x0301030800802004
CPU Status Register 0 = 0x3640c24000000000
CPU Status Register 1 = 0x8000000000000000
SADD LOG = 0x48e0000000000002
Read Short LOG = 0xc18080ff80080014
----------------- DEW 3 HPMC Information - ------
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
Runway Control Log Reg = 0x0006720000000000
Runway Address Data Log Reg Odd = 0xfffffffffffc3f00
Runway Address Data Log Reg Even = 0xfffffffffffc3f00
Runway Address Log Reg = 0x0000000000000048
Runway Broad Error Log Reg = 0x00000000000000dc
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
X ERR_ERROR X X X
Merced Bus Requestor Address = 0x0000000000000000
Merced Bus Target Address = 0x0000000000000000
Merced Bus Responder Address = 0x0000000000000000
Merced Error Status Reg = 0x2001000000082004
Merced Error Overflow Reg = 0x0000000000082000
Merced AERR Addr1 Log Reg = 0x00c0000000300000
Merced AERR Addr2 Log Reg = 0x0000000000f00000
Merced DERR Log Reg = 0x00c1100000000000
Merced Error Syndrome Reg = 0x0000000052000000
------- Processor 3^@ LPMC Information ------------------
Check Type = 0x00000000
IC Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
------- Processor 3^@ TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000
0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000
0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000
0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000
0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000
0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000
0000000000000000
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000
0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000
0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000
0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000
0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000
0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000
0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000
0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000000000000
CPU State = 0x00000000
-------------- Memory Error Log Information --------------
Bus 0 Log Information
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_ERROR X X
Bus Requestor Address = 0x0000000000000000
Bus Target Address = 0x0000000000000000
Bus Responder Address = 0x0000000000000000
Error Status Reg = 0x0000000000080000
Error Overflow Reg = 0x0000000000080000
AERR Address 1 Log Reg = 0x0000000000000000
AERR Address 2 Log Reg = 0xf800000000000000
FERR Log Reg = 0x0000000000000000
DERR Log Reg = 0x000133000051cdc0
Error Syndrome Reg = 0x0000000000000000
Address/Control Parity Error Registers
Address/Control Parity Error Bit (AE) Not Set
Bus 1 Log Information
Timestamp = Tue Mar 11 18:07:11 GMT 2003 (20:03:03:11:18:07:11)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_TIMEOUT X X
Bus Requestor Address = 0xfffffffffed2c000
Bus Target Address = 0x00000000f000a000
Bus Responder Address = 0x0000000000000000
Error Status Reg = 0x0000000000000800
Error Overflow Reg = 0x0000000000000800
AERR Address 1 Log Reg = 0x08006000f000a000
AERR Address 2 Log Reg = 0x6000b0003f700a10
FERR Log Reg = 0x0000000000000000
DERR Log Reg = 0x0000000000000000
Error Syndrome Reg = 0x0000000000000000
Address/Control Parity Error Registers
Address/Control Parity Error Bit (AE) Not Set
------------ I/O Module Error Log Information ------------
Summary of IO subsystem log entries
-----------------------------------
Phys Loc Vendor Device Severity
Description (hex) Id Id CORR UNC
FE CW
----------- ----- ------ ------
----------------
System Bus Adapter SB 0x000000ffffffff82 0x103c 0x1050 X
System Bus Adapter RP 0x000000ffff0dff83 0x103c 0x1051 X
System Bus Adapter RP 0x000000ffff0eff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff06ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff02ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff01ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff04ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff05ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000101ffff03ff83 0x103c 0x1051 X
System Bus Adapter SB 0x000000ffffffff82 0x103c 0x1050 X
System Bus Adapter RP 0x000202ffff0cff83 0x103c 0x1051 X
System Bus Adapter RP 0x000202ffff0aff83 0x103c 0x1051 X
System Bus Adapter RP 0x000202ffff09ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000202ffff0bff83 0x103c 0x1051 X
System Bus Adapter RP 0x000202ffff08ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000202ffff07ff83 0x103c 0x1051 X
Detail display of IO subsystem log entries
------------------------------------------
System Bus Adapter -- System Bus Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:10 GMT 2003 (20:03:03:11:18:09:10)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
X X ERR_ERROR X X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0xfffffffffed00000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000007ff0034
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:12 GMT 2003 (20:03:03:11:18:09:12)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff0dff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:12 GMT 2003 (20:03:03:11:18:09:12)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff0eff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:12 GMT 2003 (20:03:03:11:18:09:12)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000101ffff06ff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:12 GMT 2003 (20:03:03:11:18:09:12)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000101ffff02ff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Tue Mar 11 18:09:12 GMT 2003 (20:03:03:11:18:09:12)
[...]
Well that for an older test but I don't know yet what could be the PSW
(sorry I haven't found more doc about TOC output)?
>On second thought, I'm skeptical unstarted_count isn't coherent
>since it's a kernel global as well (like jiffies).
>
>
>
>>>You need to find out who is using smp_call_function() and which function
>>>they are trying to invoke. I suspect it's coming from mm/slab.c but
>>>would know which of the three it might be.
>>>
>>>
>>Effectively I don't find another place where it is called. And so add a
>>printk in each function calling smp_call_function_all_cpus() finaly.
>>
>>That is allowing me to notice severall call to kmem_tune_cpucache() (7 exactly)
>>(and not other) but don't get any more 'SMP CALL FUNCTION TIMED OUT (CPU=1)'
>>:(
>>(i presume that, as previously, the system crash before having the opportunity
>>to flush its buffer?)
>>
>>What do you think?
>>
>>
>
>Could be.
>Add mdelay(100) (or higher) after the lines of output you've added.
>The works if it's a functional problem that's not timing dependent.
>
>
Because during another test I reach to boot this N (well only during
half an hour) in SMP, I am quite sure that is such a problem somewhere
(the problem is to find where).
>Otherwise setup kernel crash dump and use tools from bruno/phi to view
>contents of the kernel message buffer.
>
I already thought to this (because I test severall bruno's patch), but I
have two pb to implement it:
a) my system has 2Gb (4* 512Mb iirc) of ram and I don't see how to
reconfigure the disk with at least 2Gb of swap(== dump area iirc)?
The disk slicing being:
Name Flags Part Type FS Type [Label]
Size (MB)
------------------------------------------------------------------------------
sda1 Boot Primary Linux/PA-RISC
boot 67.56
sda2 Primary Linux swap
135.11
sda3 Primary Linux ext3
130.89
sda5 Logical Linux ext3
1760.56
sda6 Logical Linux ext3
261.77
sda7 Logical Linux ext3
130.89
sda8 Logical Linux ext3
130.89
sda9 Logical Linux ext3
1574.79
sda5 being the root fs must be into the 2Gb limits iirc but I am not
quiet sure that swap also has have to be in those limits (in fact it is
just like this because of the very first puffin :) (now obsolete)
install instruction?
b) afaik p4 is not yet publicaly realesed?
Thanks in advance for your additional help,
Joel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] N Class SMP pb ? (follow up)
@ 2003-10-01 6:48 Joel Soete
2003-10-01 17:20 ` Joel Soete
0 siblings, 1 reply; 11+ messages in thread
From: Joel Soete @ 2003-10-01 6:48 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
>>
>> In summary:
>> ------- Processor 1 HPMC Information - PDC Version: 41.28 ------
>
>Did you TOC the machine or did it HPMC?
>I was under the impression the SW had hung and one needed to TOC
>to regain control. TOC info is seperate from HPM
info.
Exact, but TOC info only contains 0 so I suposed that system do actualy
a HPMC but do not seems to be managed by handle_interruption() as at its
begining I put a printk() which was suposed to write the 'code' value?
to be more accurate:
[...]
struct siginfo si;
printk(KERN_ERR "%s(%d, ...).\n", __FUNCTION__, code);
mdelay(100);
[...]
which allowing me to read a lot of 6, 15, 26 codes but never 1?
>
>If it's in fact HPMC, then look at IOAQ/GR02 for both CPUs
>and see which functions they were executing in when HPMC occurred.
which were for cpu[1]:
GR[02] == rp = 000000001014dbf0
Func: zap_page_range, Off: 0xe0, Addr: 0x1014dbf0
1014dbf0: 08 0e 02 5b copy r14,dp
1014dbf4: 03 c0 08 b4 mfctl tr6,r20
1014dbf8: 4a 93 00 b0 ldw 58(r20),r19
1014dbfc: 29 c5 20 00 addil b000,r14,%r1
[...]
Parse IAOQ = 0x000000001014dea0 for CPU[1]
Func: zap_page_range, Off: 0x390, Addr: 0x1014dea0
1014dea0: 06 a0 52 00 pdtlb
r0(sr1,r21)
1014dea4: 37 39 3f ff ldo -1(r25),r25
1014dea8: bf 33 3f e5 cmpb,*<> r19,r25,1014dea0 <zap_page_range+0x390>
1014deac: 36 b5 20 00 ldo 1000(r21),r21
And for cpu[3]:
GR[02] == rp = 000000001010cdd0
Func: handle
interruption, Off: 0xb0, Addr: 0x1010cdd0
1010cdd0: 08 05 02 5b copy r5,dp
1010cdd4: 02 00 08 b4 mfctl itmr,r20
1010cdd8: 02 00 08 b3 mfctl itmr,r19
1010cddc: 0a 93 04 33 sub r19,r20,r19
...
Parse IAOQ = 0x000000
01010cde4 for CPU[3]
Func: handle_interruption, Off: 0xc4, Addr: 0x1010cde4
1010cde0: be 7c bf e5 cmpb,*>> ret0,r19,1010cdd8 <handle_interruption+0xb8>
1010cde4: 08 00 02 40 nop
1010cde8: 34 63 3f ff ldo -1(r3),r3
1010
dec: ec 7f bf c5 cmpib,*<> -1,r3,1010cdd4 <handle_interruption+0xb4>
Am i wrong if I presume that the nop isn would be harmless on cpu[3] OTC
'pdtlb r0(sr1,r21)' ? But I do not read any code 10 printout by printk()
anyway it is the only exception: Privileged operation trap.
Thanks again,
Joel
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-10-01 6:48 Joel Soete
@ 2003-10-01 17:20 ` Joel Soete
0 siblings, 0 replies; 11+ messages in thread
From: Joel Soete @ 2003-10-01 17:20 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Hi Grant,
I also notice additional info:
a) in SL (gsp console) A grab severall message as:
Log Entry # 0 :
SYSTEM NAME: ap8002
DATE: 10/01/2003 TIME: 16:32:35
ALERT LEVEL: 2 = Non-Urgent operator attention required
SOURCE: 8 = I/O
SOURCE DETAIL: 2 = system bus adapter SOURCE ID: 1
PROBLEM DETAIL: 0 = no problem detail
CALLER ACTIVITY: 6 = machine check STATUS: 3
CALLER SUBACTIVITY: 33 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 01
0x7000102082016333 00000000 00B92200 type 14 = Problem Detail
0x5800182082016333 00006709 01102023 type 11 = Timestamp 10/01/2003 16:32:35
Type CR for next entry, Q CR to quit.
Which seems indicating an I/O pb (But I don't know how much there are relevant
because: 'implementation dependent')
b) at the end of the pim info I also notice:
[...]
------------ I/O Module Error Log Information ------------
Summary of IO subsystem log entries
-----------------------------------
Phys Loc Vendor Device Severity
Description (hex) Id Id CORR UNC FE
CW
----------- ----- ------ ------ ----------------
System Bus Adapter SB 0x000000ffffffff82 0x103c 0x1050 X
System Bus Adapter SB 0x000000ffffffff82 0x103c 0x1050 X
Detail display of IO subsystem log entries
------------------------------------------
System Bus Adapter -- System Bus Interface
------------------------------------------
Timestamp = Wed Oct 1 16:32:31 GMT 2003 (20:03:10:01:16:32:31)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
X X ERR_ERROR X X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0xfffffffffed00000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000007ff0034
System Bus Adapter -- System Bus Interface
------------------------------------------
Timestamp = Wed Oct 1 16:32:31 GMT 2003 (20:03:10:01:16:32:31)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
X X ERR_ERROR X X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0xfffffffffed40000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff01
Module Error Register = 0x0000000007ff0034
[...]
And "IO Responder Address = 0xfffffffffed40000" match the bootlog entry:
Found devices:
[...]
11. IKE I/O Bus Converter Merced Port (7) at 0xfffffffffed40000 [1], versions
0x803, 0x0, 0xc
And "IO Responder Address = 0xfffffffffed00000"
2. IKE I/O Bus Converter Merced Port (7) at 0xfffffffffed00000 [0], versions
0x803, 0x0, 0xc
Could it be the sources of the crash pb?
Thanks in advance,
Joel
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] N Class SMP pb ? (follow up)
@ 2003-09-30 16:31 Joel Soete
2003-09-30 18:50 ` Grant Grundler
0 siblings, 1 reply; 11+ messages in thread
From: Joel Soete @ 2003-09-30 16:31 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Hi Grant,
Here is the very last test I did yesterday with the additional mdelay(100):
>TOC the machine, "ser pim" and look at PSW in TOC Info for each CPU.
>bit 0 is the I-Bit IIRC.
In summary:
------- Processor 1 HPMC Information - PDC Ver
ion: 41.28 ------
[...]
CPU State = 0x9e000004
[...]
CPU Diagnose Register 2 = 0x0301010800802004
CPU Status Register 0 = 0x2640c24000000000
CPU Status Register 1 = 0x8000200000000000
[...]
------- Proces
or 3 HPMC Information - PDC Version: 41.28 ------
[...]
CPU State = 0x9e000004
[...]
CPU Diagnose Register 2 = 0x0301030800802004
CPU Status Register 0 = 0x3640c24000000000
CPU Status Register 1 = 0x80000000
0000000
[...]
all I bits (well the lowest weight PSW bit :) ) are well 0
>Could be.
>Add mdelay(100) (or higher) after the lines of output you've added.
>The works if it's a functional problem that's not timing dependent.
Well after a ver
long time of boot the system finaly crash without any
reason of panic??? (all interruption should be manage by handle_interruption?)
Just in case here is a short Pim-analyse:
------- Processor 1 HPMC Information - PDC Version: 41.28 ------
GR of CPU[1]
00-03 0000000000000000 000000001041b018 000000001014dbf0 0000000000000000
04-07 0000000000008000 000000008d113c00 0000000040200000 0000000000008000
08-11 0000000000000000 000000008d2cd008 0000000080000000 00000000103fa2c8
12-15 0000000040180000 000000008d9a6280 00000000105389c0 0000000000000000
16-19 000000001045cf88 00000000103b6338 000000008d147010 ffffffffffffffff
20-23 00000000000001ff 0000000040178000 000000008d9a6280 0000000000088000
24-27 0000000040180000 0000000000000006 0000000040180000 00000000105389c0
28-31 0000000000000000 000000008d7ccef0 000000008d7ccf40 0000000000008000
GR[02] == rp = 000000001014dbf0
Func: zap_page_range, Off: 0xe0, Addr: 0x1014dbf0
1014dbf0: 08 0e 02 5b copy r14,dp
1014dbf4: 03 c0 08 b4 mfctl tr6,r20
1014dbf8: 4a 93 00 b0 ldw 58(r20),r19
1014dbfc: 29 c5 20 00 addil b000,r14,%r1
GR[22] == t1(32bits) == arg4(64bits) = 000000008d9a6280
GR[21] == t2(32bits) == arg5(64bits) = 0000000040178000
GR[20] == t3(32bits) == arg6(64bits) = 00000000000001ff
GR[19] == t4(32bits) == arg7(64bits) = ffffffffffffffff
GR[26] == arg0 = 0000000040180000
GR[25] == arg1 = 0000000000000006
GR[24] == arg2 = 0000000040180000
GR[23] == arg3 = 0000000000088000
GR[27] == dp = 00000000105389c0
Func: __gp, Off: 0x0, Addr: 0x105389c0
GR[28] == ret0 = 0000000000000000
GR[29] == ret1 or sl = 000000008d7ccef0
GR[30] == sp = 000000008d7ccf40
GR[31] == ble rp = 0000000000008000
Not parsable address!
CR of CPU[1]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 00000000000002b2 0000000000000000 00000000000000c0 0000000000000003
12-15 0000000000000000 0000000000000000 0000000000107000 ffe0000000000000
16-19 000003182e3e3f89 0000000000000000 000000001014deac 0000000036b52000
20-23 00000000103401f5 00000000f33ccdd8 000000ff080ef70f 8000000000000000
24-27 0000000000461000 000000007d147000 0000000000041020 000000ffff95c810
28-31 5555555555555555 5555555555555555 000000008d7cc000 00000000105a0000
CR[00] == rctr = 0000000000000000
CR[08] == (Protection ID) pidr1 = 00000000000002b2
CR[10] == ccr = 00000000000000c0
CR[11] == sar = 0000000000000003
CR[14] == iva = 0000000000107000
CR[15] == eiem = ffe0000000000000
CR[16] == itmr = 000003182e3e3f89
CR[17] == pcsq = 0000000000000000
CR[18] == pcoq = 000000001014deac
CR[19] == iir = 0000000036b52000
CR[20] == isr = 00000000103401f5
CR[21] == ior = 00000000f33ccdd8
CR[22] == ipsw = 000000ff080ef70f
CR[23] == eirw = 8000000000000000
CR[24] == tr0 (ptov) = 0000000000461000
CR[25] == tr1 (vtop) = 000000007d147000
CR[26] == tr2 = 0000000000041020
CR[27] == tr3 = 000000ffff95c810
CR[28] == tr4 = 5555555555555555
CR[29] == tr5 = 5555555555555555
CR[30] == tr6 = 000000008d7cc000
CR[31] == tr7 = 00000000105a0000
SR of CPU[1]
00-03 0000ac80 0000ac80 00000000 0000ac80
04-07 00000000 00000000 00000000 00000000
Need much more work !!!
SR[00] == ts0 = 0000ac80
SR[01] == ts1 = 0000ac80
SR[03] == cpp = 0000ac80
Not parsable address!
...
IIA Offset (back entry) = 0x000000001014dea0
...
e.g. IAOQ = 0x000000001014dea0
FPR of CPU[1]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 000000008f760ec0 0000000000000002 000000001359d740 0000000000000420
08-11 0000000000000000 0000000000000802 00000000105389c0 000000001059a000
12-15 0000000013590000 0000000000000000 0000000010180574 00000000103dc6b8
16-19 00000000000009ee 000000008fa7e000 00000000105389c0 0000000013590000
20-23 00000000103b7b0c fffffffffffffff4 000000000000021e 0000002f66666667
24-27 000007b100000000 0000999903590b70 0000000003590b78 000000001041b980
28-31 000000001041b980 00000000ff915e20 0000000010187b38 0000000000000004
Parse IAOQ = 0x000000001014dea0 for CPU[1]
Func: zap_page_range, Off: 0x390, Addr: 0x1014dea0
1014dea0: 06 a0 52 00 pdtlb r0(sr1,r21)
1014dea4: 37 39 3f ff ldo -1(r25),r25
1014dea8: bf 33 3f e5 cmpb,*<> r19,r25,1014dea0 <zap_page_range+0x390>
1014deac: 36 b5 20 00 ldo 1000(r21),r21
------- Processor 3 HPMC Information - PDC Version: 41.28 ------
GR of CPU[3]
00-03 0000000000000000 0000000010429028 000000001010cdd0 0000000000000021
04-07 000000008d0c05b8 00000000105389c0 000000000000000f 0000000000000000
08-11 0000000000000000 0000000040026ee2 0000000040039141 0000000040026fb4
12-15 0000000040028380 00000000faf00950 00000000400342f4 0000000000000000
16-19 000000008d0c05b8 00000000faf00910 00000000faf00910 0000000000058706
20-23 000003182e080065 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 00000000000003e8 00000000105389c0
28-31 0000000000086470 0000000000086470 000000008d0c0b40 0000000000000226
GR[02] == rp = 000000001010cdd0
Func: handle_interruption, Off: 0xb0, Addr: 0x1010cdd0
1010cdd0: 08 05 02 5b copy r5,dp
1010cdd4: 02 00 08 b4 mfctl itmr,r20
1010cdd8: 02 00 08 b3 mfctl itmr,r19
1010cddc: 0a 93 04 33 sub r19,r20,r19
...
1010cde0: be 7c bf e5 cmpb,*>> ret0,r19,1010cdd8 <handle_interruption+0xb8>
...
...
1010cdec: ec 7f bf c5 cmpib,*<> -1,r3,1010cdd4 <handle_interruption+0xb4>
...
GR[22] == t1(32bits) == arg4(64bits) = 0000000000000000
GR[21] == t2(32bits) == arg5(64bits) = 0000000000000000
GR[20] == t3(32bits) == arg6(64bits) = 000003182e080065
GR[19] == t4(32bits) == arg7(64bits) = 0000000000058706
GR[26] == arg0 = 00000000000003e8
GR[25] == arg1 = 0000000000000000
GR[24] == arg2 = 0000000000000000
GR[23] == arg3 = 0000000000000000
GR[27] == dp = 00000000105389c0
Func: __gp, Off: 0x0, Addr: 0x105389c0
GR[28] == ret0 = 0000000000086470
GR[29] == ret1 or sl = 0000000000086470
GR[30] == sp = 000000008d0c0b40
GR[31] == ble rp = 0000000000000226
Not parsable address!
CR of CPU[3]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 00000000000002b8 0000000000000000 00000000000000c0 000000000000003f
12-15 0000000000000000 0000000000000000 0000000000107000 ffe0000000000000
16-19 000003182e158ca8 0000000000000000 000000001010cde0 00000000be7cbfe5
20-23 00000000103401f4 00000000300c0b50 000000ff0804ff0e 8000000000000000
24-27 0000000000461000 000000007d0c4000 0000000000041020 000000ffff95c810
28-31 000000ffff95c810 5555555555555555 000000008d0c0000 0000000000008020
CR[00] == rctr = 0000000000000000
CR[08] == (Protection ID) pidr1 = 00000000000002b8
CR[10] == ccr = 00000000000000c0
CR[11] == sar = 000000000000003f
CR[14] == iva = 0000000000107000
CR[15] == eiem = ffe0000000000000
CR[16] == itmr = 000003182e158ca8
CR[17] == pcsq = 0000000000000000
CR[18] == pcoq = 000000001010cde0
CR[19] == iir = 00000000be7cbfe5
CR[20] == isr = 00000000103401f4
CR[21] == ior = 00000000300c0b50
CR[22] == ipsw = 000000ff0804ff0e
CR[23] == eirw = 8000000000000000
CR[24] == tr0 (ptov) = 0000000000461000
CR[25] == tr1 (vtop) = 000000007d0c4000
CR[26] == tr2 = 0000000000041020
CR[27] == tr3 = 000000ffff95c810
CR[28] == tr4 = 000000ffff95c810
CR[29] == tr5 = 5555555555555555
CR[30] == tr6 = 000000008d0c0000
CR[31] == tr7 = 0000000000008020
SR of CPU[3]
00-03 0000ae00 00006e00 00000000 0000ae00
04-07 00000000 00000000 00000000 00000000
Need much more work !!!
SR[00] == ts0 = 0000ae00
SR[01] == ts1 = 00006e00
SR[03] == cpp = 0000ae00
Not parsable address!
...
IIA Offset (back entry) = 0x000000001010cde4
...
e.g. IAOQ = 0x000000001010cde4
FPR of CPU[3]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 000000008f760ec0 0000000000000002 000000001359d740 0000000000000420
08-11 0000000000000000 0000000000000802 00000000105389c0 000000001059a000
12-15 0000000013590000 0000000000000000 0000000010180574 00000000103dc6b8
16-19 00000000000009ee 000000008fa7e000 00000000105389c0 0000000013590000
20-23 00000000103b7b0c fffffffffffffff4 0000000000000000 0000000000000000
24-27 0000999900000000 0000999903590b70 0000000003590b78 000000001041b980
28-31 000000001041b980 00000000ff915e20 0000000010187b38 0000000000000000
Parse IAOQ = 0x000000001010cde4 for CPU[3]
Func: handle_interruption, Off: 0xc4, Addr: 0x1010cde4
1010cde0: be 7c bf e5 cmpb,*>> ret0,r19,1010cdd8 <handle_interruption+0xb8>
1010cde4: 08 00 02 40 nop
1010cde8: 34 63 3f ff ldo -1(r3),r3
1010cdec: ec 7f bf c5 cmpib,*<> -1,r3,1010cdd4 <handle_interruption+0xb4>
Any idea?
>Otherwise setup kernel crash dump and use tools from bruno/phi to view
>contents of the kernel message buffer.
Well, that seems to be the ultimate solution (I don't remember if it also
works on smp kernel?) but I will need to discuss a bit with them to see if
I reach to get a dump how could it be analysed?
Thanks again for your attention,
Joel
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-30 16:31 Joel Soete
@ 2003-09-30 18:50 ` Grant Grundler
0 siblings, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2003-09-30 18:50 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Tue, Sep 30, 2003 at 06:31:17PM +0200, Joel Soete wrote:
> Hi Grant,
>
> Here is the very last test I did yesterday with the additional mdelay(100):
>
> >TOC the machine, "ser pim" and look at PSW in TOC Info for each CPU.
> >bit 0 is the I-Bit IIRC.
>
> In summary:
> ------- Processor 1 HPMC Information - PDC Version: 41.28 ------
Did you TOC the machine or did it HPMC?
I was under the impression the SW had hung and one needed to TOC
to regain control. TOC info is seperate from HPMC info.
If it's in fact HPMC, then look at IOAQ/GR02 for both CPUs
and see which functions they were executing in when HPMC occurred.
grant
^ permalink raw reply [flat|nested] 11+ messages in thread
* [parisc-linux] N Class SMP pb ? (follow up)
@ 2003-09-25 14:56 Joel Soete
2003-09-25 15:41 ` Derek Engelhaupt
2003-09-25 23:35 ` Grant Grundler
0 siblings, 2 replies; 11+ messages in thread
From: Joel Soete @ 2003-09-25 14:56 UTC (permalink / raw)
To: parisc-linux
Hi all,
Trying to continue investigation, I puted a printk at the begining of handle_interruption()
to get just the interruption's 'code' managed.
As already mentionned in previous mail that I could read many 6, 15 (but
it seems to be normal: e
en in UP kernel those interruption occurs) but
(most interesting) it is the very first time that I got the message making
failled the kernel:
[...]
handle_interruption(26, ...).
SMP CALL FUNCTION TIMED OUT (CPU=1)
handle_interruption(26, ...).
Stack dump:
[...]
(unfortunately I couldn't grab this dump :( )
Could this be a pb with sync between cpu time ref? (because timeout = jiffies
+ HZ)
I have also a look for where this function is called but never see its return
code tested to launch a 'stack dump' and a stop of system?
Thanks in advance for help,
Joel
PS: I don't know if it is important but the two cpus on this server are located
in slot 1 and 3 (not in slot 1 and 2 as we would logicaly expect)
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-25 14:56 Joel Soete
@ 2003-09-25 15:41 ` Derek Engelhaupt
2003-09-25 23:35 ` Grant Grundler
1 sibling, 0 replies; 11+ messages in thread
From: Derek Engelhaupt @ 2003-09-25 15:41 UTC (permalink / raw)
To: parisc-linux
[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]
They are in the right slots...N Class CPU loading in order: 1,3,5,7,0,2,4,6. If you are looking at the back of the machine with the rear cover open, the two cpus should be in the left two slot. First memory carrier should be in the right most slot and loaded toward the left. I should know since I just had to tear apart an entire N to upgrade it from 6 550Mhz cpus to 8 750Mhz cpus. Takes about 3 hours and it requires a system board change. The N has 3 system boards: an A, a B, and a C rev. "A" is for 360-440. "B" is for 360-550. And the "C" is for the 650-750, but I'm sure it would accept all the processors slower than 650 too with the right speed setting on the dip switches.
derek
Joel Soete <soete.joel@tiscali.be> wrote:
Hi all,
Trying to continue investigation, I puted a printk at the begining of handle_interruption()
to get just the interruption's 'code' managed.
As already mentionned in previous mail that I could read many 6, 15 (but
it seems to be normal: e
en in UP kernel those interruption occurs) but
(most interesting) it is the very first time that I got the message making
failled the kernel:
[...]
handle_interruption(26, ...).
SMP CALL FUNCTION TIMED OUT (CPU=1)
handle_interruption(26, ...).
Stack dump:
[...]
(unfortunately I couldn't grab this dump :( )
Could this be a pb with sync between cpu time ref? (because timeout = jiffies
+ HZ)
I have also a look for where this function is called but never see its return
code tested to launch a 'stack dump' and a stop of system?
Thanks in advance for help,
Joel
PS: I don't know if it is important but the two cpus on this server are located
in slot 1 and 3 (not in slot 1 and 2 as we would logicaly expect)
-------------------------------------------------------------------------
L'Internet rapide, c'est pour tout le monde. Tiscali ADSL, 19,50 Euro
pendant 3 mois! http://reg.tiscali.be/default.asp?lg=fr
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
---------------------------------
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
[-- Attachment #2: Type: text/html, Size: 2649 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [parisc-linux] N Class SMP pb ? (follow up)
2003-09-25 14:56 Joel Soete
2003-09-25 15:41 ` Derek Engelhaupt
@ 2003-09-25 23:35 ` Grant Grundler
1 sibling, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2003-09-25 23:35 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Thu, Sep 25, 2003 at 04:56:26PM +0200, Joel Soete wrote:
...
> As already mentionned in previous mail that I could read many 6, 15 (but
> it seems to be normal in UP kernel those interruption occurs)
Yes - 6 is ITLB miss and 15 is Data TLB miss.
> but (most interesting) it is the very first time that I got
> the message making failed the kernel:
> [...]
> handle_interruption(26, ...).
26 is "Data Memory Access rights Trap".
This sounds normal for Copy-On-Write.
> SMP CALL FUNCTION TIMED OUT (CPU=1)
The IPI handler will time out if the other CPU doesn't ack
the function call with in a second. This is bad.
It means either other CPU never got the interrupt (locked up
with I-bit off) or the "unstarted_count" isn't coherent
between the CPUs.
> handle_interruption(26, ...).
>
> Could this be a pb with sync between cpu time ref?
> (because timeout = jiffies + HZ)
I don't think so since jiffies is a global.
And it's always be measured on the same CPU.
> I have also a look for where this function is called but never see its return
> code tested to launch a 'stack dump' and a stop of system?
You need to find out who is using smp_call_function() and which function
they are trying to invoke. I suspect it's coming from mm/slab.c but
would know which of the three it might be.
grant
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-10-01 17:21 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-26 15:46 [parisc-linux] N Class SMP pb ? (follow up) Joel Soete
2003-09-26 16:08 ` Joel Soete
2003-09-26 16:50 ` Grant Grundler
2003-09-27 18:16 ` Joel Soete
-- strict thread matches above, loose matches on Subject: below --
2003-10-01 6:48 Joel Soete
2003-10-01 17:20 ` Joel Soete
2003-09-30 16:31 Joel Soete
2003-09-30 18:50 ` Grant Grundler
2003-09-25 14:56 Joel Soete
2003-09-25 15:41 ` Derek Engelhaupt
2003-09-25 23:35 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox