* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
@ 2004-03-30 15:19 ` Bjorn Helgaas
2004-03-30 18:52 ` Luck, Tony
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2004-03-30 15:19 UTC (permalink / raw)
To: linux-ia64
On Tuesday 30 March 2004 12:02 am, Francois Wellenreiter wrote:
> Then I push on the "dump" button that generates an INIT interruption to
> all the processors. This signal is then caught by PAL, and SAL which
> calls (with a reason [register GR11] equal to 2)
> "ia64_monarch_init_handler" on the monarch processor and
> "ia64_slave_init_handler" on the slave ones (to this point, I hope to be
> right, isn't it ?).
Yes. If I understand correctly, you observe that one processor
calls ia64_monarch_init_handler(), and all the others call
ia64_slave_init_handler(). So far, that is correct behavior.
> What I've noticed using traces (and further an ITP tool) is that for
> each processor the "ia64_monarch_init_handler" is ever called. :-(
Are you saying that more than one processor calls
ia64_monarch_init_handler()? If so, I think something
is broken. But I haven't seen that behavior. Here is
ia64_slave_init_handler():
GLOBAL_ENTRY(ia64_slave_init_handler)
1: br.sptk 1b
END(ia64_slave_init_handler)
So I'd be surprised to see the slave processors do anything
interesting.
I do know that we are missing some useful behavior in this area --
namely, we don't extract the min-state area for the slave processors,
so we don't get backtraces for the currently-running tasks on them.
Bjorn
^ permalink raw reply [flat|nested] 8+ messages in thread* RE: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
2004-03-30 15:19 ` Bjorn Helgaas
@ 2004-03-30 18:52 ` Luck, Tony
2004-03-31 6:37 ` Francois Wellenreiter
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2004-03-30 18:52 UTC (permalink / raw)
To: linux-ia64
>> What I've noticed using traces (and further an ITP tool) is that for
>> each processor the "ia64_monarch_init_handler" is ever called. :-(
>
>Are you saying that more than one processor calls
>ia64_monarch_init_handler()? If so, I think something
>is broken.
Or are you saying that NO processor gets to ia64_monarch_init_handler?
Just a sanity check ... when you set the ITP breakpoint to catch
the entry ... you did set it at the *physical* address of this function?
-Tony
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
2004-03-30 15:19 ` Bjorn Helgaas
2004-03-30 18:52 ` Luck, Tony
@ 2004-03-31 6:37 ` Francois Wellenreiter
2004-03-31 6:43 ` Francois Wellenreiter
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-03-31 6:37 UTC (permalink / raw)
To: linux-ia64
>>Then I push on the "dump" button that generates an INIT interruption to
>>all the processors. This signal is then caught by PAL, and SAL which
>>calls (with a reason [register GR11] equal to 2)
>>"ia64_monarch_init_handler" on the monarch processor and
>>"ia64_slave_init_handler" on the slave ones (to this point, I hope to be
>>right, isn't it ?).
>
>
> Yes. If I understand correctly, you observe that one processor
> calls ia64_monarch_init_handler(), and all the others call
> ia64_slave_init_handler(). So far, that is correct behavior.
Hum, in fact, that's the behaviour I was expecting to observe... reading
the kernel code.
>>What I've noticed using traces (and further an ITP tool) is that for
>>each processor the "ia64_monarch_init_handler" is ever called. :-(
>
>
> Are you saying that more than one processor calls
> ia64_monarch_init_handler()? If so, I think something
> is broken. But I haven't seen that behavior. Here is
> ia64_slave_init_handler():
>
> GLOBAL_ENTRY(ia64_slave_init_handler)
> 1: br.sptk 1b
> END(ia64_slave_init_handler)
>
Right, I was just surprised to see 4 times the trace "Entered OS INIT
handler" appearing on my console screen. Then, I ran a JTAG debugger
to set breakpoints in the kernel code and noticed that all CPUs called
the same function, e.g. "ia64_monarch_init_handler" (without using
any Rendez-Vous mechanism ?!).
> So I'd be surprised to see the slave processors do anything
> interesting.
>
> I do know that we are missing some useful behavior in this area --
> namely, we don't extract the min-state area for the slave processors,
> so we don't get backtraces for the currently-running tasks on them.
On that precise point, I thougt that PAL was recording register states
in a processor dedicated area called "min-state area" for each processor
receiving INIT signal (and in the present case all the processors catch
it). Can you confirm this ?
In fact, I want to be able pushing the "dump" button to get the
information for all the running processors (and not just one) and then
determine for example which one is "deadlocking".
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
` (2 preceding siblings ...)
2004-03-31 6:37 ` Francois Wellenreiter
@ 2004-03-31 6:43 ` Francois Wellenreiter
2004-03-31 19:44 ` Bjorn Helgaas
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-03-31 6:43 UTC (permalink / raw)
To: linux-ia64
>>>What I've noticed using traces (and further an ITP tool) is that for
>>>each processor the "ia64_monarch_init_handler" is ever called. :-(
>>
>>Are you saying that more than one processor calls
>>ia64_monarch_init_handler()? If so, I think something
>>is broken.
>
>
> Or are you saying that NO processor gets to ia64_monarch_init_handler?
All the processors call the "ia64_monarch_init_handler",
the result is that the same trace appears many times (1 per CPU in fact)
and a kernel oops occurs (I think it is due to a concurrent access
to the same stack without any lock mechanism).
> Just a sanity check ... when you set the ITP breakpoint to catch
> the entry ... you did set it at the *physical* address of this function?
Yes, using Hardware breakpoints. When managing INIT interrupt, SAL calls
the physical address of my function.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
` (3 preceding siblings ...)
2004-03-31 6:43 ` Francois Wellenreiter
@ 2004-03-31 19:44 ` Bjorn Helgaas
2004-04-01 3:23 ` Jim Garlick
2004-04-01 6:26 ` Francois Wellenreiter
6 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2004-03-31 19:44 UTC (permalink / raw)
To: linux-ia64
On Tuesday 30 March 2004 11:43 pm, Francois Wellenreiter wrote:
> All the processors call the "ia64_monarch_init_handler",
> the result is that the same trace appears many times (1 per CPU in fact)
> and a kernel oops occurs (I think it is due to a concurrent access
> to the same stack without any lock mechanism).
What kind of machine is this? Linux registers different OS_INIT
procedures for the monarch and non-monarch processors, so it
sounds like a SAL bug if more than one CPU calls
ia64_monarch_init_handler().
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
` (4 preceding siblings ...)
2004-03-31 19:44 ` Bjorn Helgaas
@ 2004-04-01 3:23 ` Jim Garlick
2004-04-01 6:26 ` Francois Wellenreiter
6 siblings, 0 replies; 8+ messages in thread
From: Jim Garlick @ 2004-04-01 3:23 UTC (permalink / raw)
To: linux-ia64
On Wed, 31 Mar 2004, Bjorn Helgaas wrote:
> On Tuesday 30 March 2004 11:43 pm, Francois Wellenreiter wrote:
> > All the processors call the "ia64_monarch_init_handler",
> > the result is that the same trace appears many times (1 per CPU in fact)
> > and a kernel oops occurs (I think it is due to a concurrent access
> > to the same stack without any lock mechanism).
>
> What kind of machine is this? Linux registers different OS_INIT
> procedures for the monarch and non-monarch processors, so it
> sounds like a SAL bug if more than one CPU calls
> ia64_monarch_init_handler().
We see this also on Intel Tiger4's.
Jim
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Abnormal behaviour towards "INIT" interrupt management
2004-03-30 7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
` (5 preceding siblings ...)
2004-04-01 3:23 ` Jim Garlick
@ 2004-04-01 6:26 ` Francois Wellenreiter
6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-04-01 6:26 UTC (permalink / raw)
To: linux-ia64
>
>>All the processors call the "ia64_monarch_init_handler",
>>the result is that the same trace appears many times (1 per CPU in fact)
>>and a kernel oops occurs (I think it is due to a concurrent access
>>to the same stack without any lock mechanism).
>
>
> What kind of machine is this?
I've done my tests first on Tiger4 (with a SAL version 3.00)
and after on a Bull NOvaScale 5160.
These tests went to the same results...
> Linux registers different OS_INIT
> procedures for the monarch and non-monarch processors, so it
> sounds like a SAL bug if more than one CPU calls
> ia64_monarch_init_handler().
That's what I fear.
Francois WELLENREITER
^ permalink raw reply [flat|nested] 8+ messages in thread