* [RFC] SAL_MC_RENDEZ logic
@ 2005-09-12 6:59 Hidetoshi Seto
2005-09-12 7:25 ` Keith Owens
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2005-09-12 6:59 UTC (permalink / raw)
To: linux-ia64
Hi all,
I'm now testing the MCA codes on brand-new system,
and bump into a problem that slave processors infinitely
loop in ia64_mca_wakeup_ipi_wait().
The cause was that the SAL clears the IRR bit just after its
spin in SAL_MC_RENDEZ procedure, and OS spins again until the
IRR bit be set in ia64_mca_wakeup_ipi_wait().
According to the SAL spec, it says:
(SAL_MC_RENDEZ:)
When this procedure returns, it is the responsibility of the
operating system to clear the IRR bits for the MC_rendezvous
interrupt and the wake up interrupt, if any.
I'm not sure but it seems "if any" means that SAL can clear
the IRR bits on behalf of OS. So OS shouldn't expect the IRR
always be set on returning from SAL_MC_RENDEZ, is this right?
I found a archive 2 years ago, from Keith:
http://marc.theaimsgroup.com/?l=linux-ia64&m\x105590709805820
However there was no responce...
I don't know whether there is any old SAL never spins in
SAL_MC_RENDEZ or not. Or is this the beginning of nightmare,
having different MCA codes depend on the SAL version?
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] SAL_MC_RENDEZ logic
2005-09-12 6:59 [RFC] SAL_MC_RENDEZ logic Hidetoshi Seto
@ 2005-09-12 7:25 ` Keith Owens
2005-09-12 8:27 ` Hidetoshi Seto
2005-09-12 23:36 ` John Ik Lee (WA)
2 siblings, 0 replies; 4+ messages in thread
From: Keith Owens @ 2005-09-12 7:25 UTC (permalink / raw)
To: linux-ia64
On Mon, 12 Sep 2005 15:59:04 +0900,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> wrote:
>I'm now testing the MCA codes on brand-new system,
>and bump into a problem that slave processors infinitely
>loop in ia64_mca_wakeup_ipi_wait().
>
>The cause was that the SAL clears the IRR bit just after its
>spin in SAL_MC_RENDEZ procedure, and OS spins again until the
>IRR bit be set in ia64_mca_wakeup_ipi_wait().
>
>According to the SAL spec, it says:
> (SAL_MC_RENDEZ:)
> When this procedure returns, it is the responsibility of the
> operating system to clear the IRR bits for the MC_rendezvous
> interrupt and the wake up interrupt, if any.
The IRR bits are read only. The OS clears them by reading cr.ivr, in
the external interrupt vector. The only reason that mca.c tests IRR
directly is because at that point interrupts are disabled.
>I'm not sure but it seems "if any" means that SAL can clear
>the IRR bits on behalf of OS. So OS shouldn't expect the IRR
>always be set on returning from SAL_MC_RENDEZ, is this right?
The phrase "if any" is quite ambiguous, it is not clear what it means
here.
>I don't know whether there is any old SAL never spins in
>SAL_MC_RENDEZ or not. Or is this the beginning of nightmare,
>having different MCA codes depend on the SAL version?
I hope not. In any case my MCA/INIT rewrite removes the spin in mca.c
waiting for IRR to be set. Instead the slave comes out of SAL due to a
wake up call, waits for the monarch to exit then the slaves all exit.
Once a slave resumes to its normal context and interrupts are enabled
again, then the external interrupt vector clears the wake up bit and
calls ia64_mca_wakeup_int_handler() which is a no-op. The rendezvous
IRR bit is cleared when we read cr.ivr prior to calling
ia64_mca_rendez_int_handler(), i.e. this bit is already clear when we
rendezvous.
In your case I would say that SAL is wrong. I would argue that SAL
should not be reading cr.ivr at all, it should leave that to the OS.
The existing (2.6.13) code will not work with that SAL. My rewrite
(hopefully in 2.6.14-rc1) will work with that SAL.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] SAL_MC_RENDEZ logic
2005-09-12 6:59 [RFC] SAL_MC_RENDEZ logic Hidetoshi Seto
2005-09-12 7:25 ` Keith Owens
@ 2005-09-12 8:27 ` Hidetoshi Seto
2005-09-12 23:36 ` John Ik Lee (WA)
2 siblings, 0 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2005-09-12 8:27 UTC (permalink / raw)
To: linux-ia64
Thank you for your reply, Keith.
Keith Owens wrote:
> The IRR bits are read only. The OS clears them by reading cr.ivr, in
> the external interrupt vector. The only reason that mca.c tests IRR
> directly is because at that point interrupts are disabled.
I forgot to mention, the SAL actually reads cr.ivr and writes cr.eoi.
>>I'm not sure but it seems "if any" means that SAL can clear
>>the IRR bits on behalf of OS. So OS shouldn't expect the IRR
>>always be set on returning from SAL_MC_RENDEZ, is this right?
>
> The phrase "if any" is quite ambiguous, it is not clear what it means
> here.
I agree. It should be written in full sentence.
>>I don't know whether there is any old SAL never spins in
>>SAL_MC_RENDEZ or not. Or is this the beginning of nightmare,
>>having different MCA codes depend on the SAL version?
>
> I hope not. In any case my MCA/INIT rewrite removes the spin in mca.c
> waiting for IRR to be set. Instead the slave comes out of SAL due to a
> wake up call, waits for the monarch to exit then the slaves all exit.
>
> Once a slave resumes to its normal context and interrupts are enabled
> again, then the external interrupt vector clears the wake up bit and
> calls ia64_mca_wakeup_int_handler() which is a no-op. The rendezvous
> IRR bit is cleared when we read cr.ivr prior to calling
> ia64_mca_rendez_int_handler(), i.e. this bit is already clear when we
> rendezvous.
>
> In your case I would say that SAL is wrong. I would argue that SAL
> should not be reading cr.ivr at all, it should leave that to the OS.
> The existing (2.6.13) code will not work with that SAL. My rewrite
> (hopefully in 2.6.14-rc1) will work with that SAL.
I appreciate your work very well.
I'll argue off this problem with developers of the SAL instead of you.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [RFC] SAL_MC_RENDEZ logic
2005-09-12 6:59 [RFC] SAL_MC_RENDEZ logic Hidetoshi Seto
2005-09-12 7:25 ` Keith Owens
2005-09-12 8:27 ` Hidetoshi Seto
@ 2005-09-12 23:36 ` John Ik Lee (WA)
2 siblings, 0 replies; 4+ messages in thread
From: John Ik Lee (WA) @ 2005-09-12 23:36 UTC (permalink / raw)
To: linux-ia64
MCA experts,
Itanium 2 processor datasheet mentions about ETM generated CMCI but none
of the PAL/SAL spec/Itanium error handling guide has that info.
PAL_MC_ERROR_INFO/SAL_GET_STATE_INFO for processor errors do not have
any entries related to thermal event.
When CMCI is sent, I presume there's an error record of it.
Where can I find the info of ETM-CMC related error record format/doc?
Itanium 2 processor datasheet reads:
#5.1.2 Enhanced Thermal Management
...Once the thermal sensing device observes the temperature rise above
the thermal entry point, the processor will enter a low power mode of
execution and notify the system by sending a Correctable Machine Check
Interrupt (CMCI). ...
Thanks,
John Ik Lee (J.I.)
Sr. Staff Engineer
Platform Solutions, Inc
-----Original Message-----
From: linux-ia64-owner@vger.kernel.org
[mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Hidetoshi Seto
Sent: Monday, September 12, 2005 1:28 AM
To: Keith Owens
Cc: linux-ia64@vger.kernel.org
Subject: Re: [RFC] SAL_MC_RENDEZ logic
Thank you for your reply, Keith.
Keith Owens wrote:
> The IRR bits are read only. The OS clears them by reading cr.ivr, in
> the external interrupt vector. The only reason that mca.c tests IRR
> directly is because at that point interrupts are disabled.
I forgot to mention, the SAL actually reads cr.ivr and writes cr.eoi.
>>I'm not sure but it seems "if any" means that SAL can clear
>>the IRR bits on behalf of OS. So OS shouldn't expect the IRR
>>always be set on returning from SAL_MC_RENDEZ, is this right?
>
> The phrase "if any" is quite ambiguous, it is not clear what it means
> here.
I agree. It should be written in full sentence.
>>I don't know whether there is any old SAL never spins in
>>SAL_MC_RENDEZ or not. Or is this the beginning of nightmare,
>>having different MCA codes depend on the SAL version?
>
> I hope not. In any case my MCA/INIT rewrite removes the spin in mca.c
> waiting for IRR to be set. Instead the slave comes out of SAL due to
a
> wake up call, waits for the monarch to exit then the slaves all exit.
>
> Once a slave resumes to its normal context and interrupts are enabled
> again, then the external interrupt vector clears the wake up bit and
> calls ia64_mca_wakeup_int_handler() which is a no-op. The rendezvous
> IRR bit is cleared when we read cr.ivr prior to calling
> ia64_mca_rendez_int_handler(), i.e. this bit is already clear when we
> rendezvous.
>
> In your case I would say that SAL is wrong. I would argue that SAL
> should not be reading cr.ivr at all, it should leave that to the OS.
> The existing (2.6.13) code will not work with that SAL. My rewrite
> (hopefully in 2.6.14-rc1) will work with that SAL.
I appreciate your work very well.
I'll argue off this problem with developers of the SAL instead of you.
Thanks,
H.Seto
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-09-12 23:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-12 6:59 [RFC] SAL_MC_RENDEZ logic Hidetoshi Seto
2005-09-12 7:25 ` Keith Owens
2005-09-12 8:27 ` Hidetoshi Seto
2005-09-12 23:36 ` John Ik Lee (WA)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox