public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Abnormal behaviour towards "INIT" interrupt management
@ 2004-03-30  7:02 Francois Wellenreiter
  2004-03-30 15:19 ` Bjorn Helgaas
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-03-30  7:02 UTC (permalink / raw)
  To: linux-ia64


			Hi,

	I've already sent this mail, but receiving no feedback, I try again 
(sorry for people who don't feel concerned by that stuff).
I'd like to report what I estimate a potential bug in the INIT 
management (at the present time I don't know exactly what the origin of 
this problem is).

Testing the "dump" button on a 4-way Itanium-2 machine equipped with a 
SAL 3.0, I've noticed a not-inline with SAL specifications behaviour.
Indeed, in the "ia64_mca_init" function (in arch/ia64/kernel/mca.c file)
we register the functions "ia64_monarch_init_handler" and 
"ia64_slave_init_handler".

Then I push on the "dump" button that generates an INIT interruption to 
all the processors. This signal is then caught by PAL, and SAL which 
calls (with a reason [register GR11] equal to 2) 
"ia64_monarch_init_handler" on the monarch processor and 
"ia64_slave_init_handler" on the slave ones (to this point, I hope to be 
right, isn't it ?).
What I've noticed using traces (and further an ITP tool) is that for 
each processor the "ia64_monarch_init_handler" is ever called. :-(
Could someone tell me if he already has encountered this problem or if 
it is an expected behavior ?

For information, I've done the same test with a 16-way Itanium-2 machine
with a SAL 3.1 and the result is exactly the same.

Thanks for your help.

Best regards,

			Francois WELLENREITER


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
@ 2004-03-30 15:19 ` Bjorn Helgaas
  2004-03-30 18:52 ` Luck, Tony
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2004-03-30 15:19 UTC (permalink / raw)
  To: linux-ia64

On Tuesday 30 March 2004 12:02 am, Francois Wellenreiter wrote:
> Then I push on the "dump" button that generates an INIT interruption to 
> all the processors. This signal is then caught by PAL, and SAL which 
> calls (with a reason [register GR11] equal to 2) 
> "ia64_monarch_init_handler" on the monarch processor and 
> "ia64_slave_init_handler" on the slave ones (to this point, I hope to be 
> right, isn't it ?).

Yes.  If I understand correctly, you observe that one processor
calls ia64_monarch_init_handler(), and all the others call
ia64_slave_init_handler().  So far, that is correct behavior.

> What I've noticed using traces (and further an ITP tool) is that for 
> each processor the "ia64_monarch_init_handler" is ever called. :-(

Are you saying that more than one processor calls
ia64_monarch_init_handler()?  If so, I think something
is broken.  But I haven't seen that behavior.  Here is
ia64_slave_init_handler():

	GLOBAL_ENTRY(ia64_slave_init_handler)
	1:      br.sptk 1b
	END(ia64_slave_init_handler)

So I'd be surprised to see the slave processors do anything
interesting.

I do know that we are missing some useful behavior in this area --
namely, we don't extract the min-state area for the slave processors,
so we don't get backtraces for the currently-running tasks on them.

Bjorn



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
  2004-03-30 15:19 ` Bjorn Helgaas
@ 2004-03-30 18:52 ` Luck, Tony
  2004-03-31  6:37 ` Francois Wellenreiter
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2004-03-30 18:52 UTC (permalink / raw)
  To: linux-ia64

>> What I've noticed using traces (and further an ITP tool) is that for 
>> each processor the "ia64_monarch_init_handler" is ever called. :-(
>
>Are you saying that more than one processor calls
>ia64_monarch_init_handler()?  If so, I think something
>is broken.

Or are you saying that NO processor gets to ia64_monarch_init_handler?

Just a sanity check ... when you set the ITP breakpoint to catch
the entry ... you did set it at the *physical* address of this function?

-Tony

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
  2004-03-30 15:19 ` Bjorn Helgaas
  2004-03-30 18:52 ` Luck, Tony
@ 2004-03-31  6:37 ` Francois Wellenreiter
  2004-03-31  6:43 ` Francois Wellenreiter
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-03-31  6:37 UTC (permalink / raw)
  To: linux-ia64


>>Then I push on the "dump" button that generates an INIT interruption to 
>>all the processors. This signal is then caught by PAL, and SAL which 
>>calls (with a reason [register GR11] equal to 2) 
>>"ia64_monarch_init_handler" on the monarch processor and 
>>"ia64_slave_init_handler" on the slave ones (to this point, I hope to be 
>>right, isn't it ?).
> 
> 
> Yes.  If I understand correctly, you observe that one processor
> calls ia64_monarch_init_handler(), and all the others call
> ia64_slave_init_handler().  So far, that is correct behavior.

Hum, in fact, that's the behaviour I was expecting to observe... reading 
the kernel code.

>>What I've noticed using traces (and further an ITP tool) is that for 
>>each processor the "ia64_monarch_init_handler" is ever called. :-(
> 
> 
> Are you saying that more than one processor calls
> ia64_monarch_init_handler()?  If so, I think something
> is broken.  But I haven't seen that behavior.  Here is
> ia64_slave_init_handler():
> 
> 	GLOBAL_ENTRY(ia64_slave_init_handler)
> 	1:      br.sptk 1b
> 	END(ia64_slave_init_handler)
> 

Right, I was just surprised to see 4 times the trace "Entered OS INIT 
handler" appearing on my console screen. Then, I ran a JTAG debugger
to set breakpoints in the kernel code and noticed that all CPUs called
the same function, e.g. "ia64_monarch_init_handler" (without using
any Rendez-Vous mechanism ?!).

> So I'd be surprised to see the slave processors do anything
> interesting.
> 
> I do know that we are missing some useful behavior in this area --
> namely, we don't extract the min-state area for the slave processors,
> so we don't get backtraces for the currently-running tasks on them.

On that precise point, I thougt that PAL was recording register states
in a processor dedicated area called "min-state area" for each processor
receiving INIT signal (and in the present case all the processors catch 
it). Can you confirm this ?

In fact, I want to be able pushing the "dump" button to get the 
information for all the running processors (and not just one) and then 
determine for example which one is "deadlocking".


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
                   ` (2 preceding siblings ...)
  2004-03-31  6:37 ` Francois Wellenreiter
@ 2004-03-31  6:43 ` Francois Wellenreiter
  2004-03-31 19:44 ` Bjorn Helgaas
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-03-31  6:43 UTC (permalink / raw)
  To: linux-ia64

>>>What I've noticed using traces (and further an ITP tool) is that for 
>>>each processor the "ia64_monarch_init_handler" is ever called. :-(
>>
>>Are you saying that more than one processor calls
>>ia64_monarch_init_handler()?  If so, I think something
>>is broken.
> 
> 
> Or are you saying that NO processor gets to ia64_monarch_init_handler?

All the processors call the "ia64_monarch_init_handler",
the result is that the same trace appears many times (1 per CPU in fact)
and a kernel oops occurs (I think it is due to a concurrent access
to the same stack without any lock mechanism).

> Just a sanity check ... when you set the ITP breakpoint to catch
> the entry ... you did set it at the *physical* address of this function?

Yes, using Hardware breakpoints. When managing INIT interrupt, SAL calls
the physical address of my function.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
                   ` (3 preceding siblings ...)
  2004-03-31  6:43 ` Francois Wellenreiter
@ 2004-03-31 19:44 ` Bjorn Helgaas
  2004-04-01  3:23 ` Jim Garlick
  2004-04-01  6:26 ` Francois Wellenreiter
  6 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2004-03-31 19:44 UTC (permalink / raw)
  To: linux-ia64

On Tuesday 30 March 2004 11:43 pm, Francois Wellenreiter wrote:
> All the processors call the "ia64_monarch_init_handler",
> the result is that the same trace appears many times (1 per CPU in fact)
> and a kernel oops occurs (I think it is due to a concurrent access
> to the same stack without any lock mechanism).

What kind of machine is this?  Linux registers different OS_INIT
procedures for the monarch and non-monarch processors, so it
sounds like a SAL bug if more than one CPU calls
ia64_monarch_init_handler().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
                   ` (4 preceding siblings ...)
  2004-03-31 19:44 ` Bjorn Helgaas
@ 2004-04-01  3:23 ` Jim Garlick
  2004-04-01  6:26 ` Francois Wellenreiter
  6 siblings, 0 replies; 8+ messages in thread
From: Jim Garlick @ 2004-04-01  3:23 UTC (permalink / raw)
  To: linux-ia64


On Wed, 31 Mar 2004, Bjorn Helgaas wrote:

> On Tuesday 30 March 2004 11:43 pm, Francois Wellenreiter wrote:
> > All the processors call the "ia64_monarch_init_handler",
> > the result is that the same trace appears many times (1 per CPU in fact)
> > and a kernel oops occurs (I think it is due to a concurrent access
> > to the same stack without any lock mechanism).
>
> What kind of machine is this?  Linux registers different OS_INIT
> procedures for the monarch and non-monarch processors, so it
> sounds like a SAL bug if more than one CPU calls
> ia64_monarch_init_handler().

We see this also on Intel Tiger4's.

Jim


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Abnormal behaviour towards "INIT" interrupt management
  2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
                   ` (5 preceding siblings ...)
  2004-04-01  3:23 ` Jim Garlick
@ 2004-04-01  6:26 ` Francois Wellenreiter
  6 siblings, 0 replies; 8+ messages in thread
From: Francois Wellenreiter @ 2004-04-01  6:26 UTC (permalink / raw)
  To: linux-ia64


> 
>>All the processors call the "ia64_monarch_init_handler",
>>the result is that the same trace appears many times (1 per CPU in fact)
>>and a kernel oops occurs (I think it is due to a concurrent access
>>to the same stack without any lock mechanism).
> 
> 
> What kind of machine is this?  

I've done my tests first on Tiger4 (with a SAL version 3.00)
and after on a Bull NOvaScale 5160.
These tests went to the same results...

> Linux registers different OS_INIT
> procedures for the monarch and non-monarch processors, so it
> sounds like a SAL bug if more than one CPU calls
> ia64_monarch_init_handler().

That's what I fear.



		Francois WELLENREITER


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-04-01  6:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-30  7:02 Abnormal behaviour towards "INIT" interrupt management Francois Wellenreiter
2004-03-30 15:19 ` Bjorn Helgaas
2004-03-30 18:52 ` Luck, Tony
2004-03-31  6:37 ` Francois Wellenreiter
2004-03-31  6:43 ` Francois Wellenreiter
2004-03-31 19:44 ` Bjorn Helgaas
2004-04-01  3:23 ` Jim Garlick
2004-04-01  6:26 ` Francois Wellenreiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox