* 8260 - Spurious interrupt when calling __sti()
@ 2002-04-09 14:53 Jean-Denis Boyer
2002-04-09 17:09 ` Ricardo Scop
2002-04-09 18:26 ` Dan Malek
0 siblings, 2 replies; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-09 14:53 UTC (permalink / raw)
To: linuxppc-embedded
I have a custom board that uses an 8260 (rev. A.1 1K22A).
We've had for a long time a problem of spurious interrupt.
On kernel 2.4.10, at boot up, the following message was written to the
console:
Unhandled interrupt 0, disabled
This message did not appear on kernel 2.4.18 (I don't know why),
but in /proc/interrupts, the number at the right of BAD was increasing
slowly.
When flooding the board through the network, thus generating a lot of
interrupts,
the number of BAD increased faster (but not as fast as the fenet).
[root@10.20.125.254 /]# cat /proc/interrupts
CPU0
4: 290 8260 SIU Edge uart
33: 192388 8260 SIU Edge fenet
BAD: 738
Putting traces in the interrupt handler, it appeared that the interrupt
happened in '__sti()' (arch/ppc/kernel/misc.S), just after calling 'mtmsr'
to turn on the 'EE' bit.
I added a 'sync', between 'ori r3,r3,MSR_EE' and 'mtmsr r3',
and it has fixed the problem.
My questions are:
- Did anybody encountered the same problem on that core?
- Did anybody seen something about that in the user's manual and/or the
errata?
- Is my fix correct, and should it be brought to other calls to 'mtmsr' ?
--------------------------------------------
Jean-Denis Boyer, B.Eng., System Architect
Mediatrix Telecom Inc.
4229 Garlock Street
Sherbrooke (Québec)
J1L 2C8 CANADA
(819)829-8749 x241
--------------------------------------------
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-09 14:53 Jean-Denis Boyer
@ 2002-04-09 17:09 ` Ricardo Scop
2002-04-09 18:26 ` Dan Malek
1 sibling, 0 replies; 13+ messages in thread
From: Ricardo Scop @ 2002-04-09 17:09 UTC (permalink / raw)
To: Jean-Denis Boyer; +Cc: linuxppc-embedded
Jean-Denis,
See comments below.
[]'s, Scop mailto:scop@vanet.com.br
------------------------------------------------------------------
"We all lie in the gutter, but some of us look up at the stars."
-- Oscar Wilde
Tuesday, April 09, 2002, 11:53:45 AM, you wrote:
JDB> I have a custom board that uses an 8260 (rev. A.1 1K22A).
Same CPU and rev, other custom board.
JDB> We've had for a long time a problem of spurious interrupt.
JDB> On kernel 2.4.10, at boot up, the following message was written to the
JDB> console:
JDB> Unhandled interrupt 0, disabled
We have this with kernel 2.4.16.
JDB> This message did not appear on kernel 2.4.18 (I don't know why),
Didn't try 2.4.18, yet.
JDB> but in /proc/interrupts, the number at the right of BAD was increasing
JDB> slowly.
<snip>
JDB> Putting traces in the interrupt handler, it appeared that the interrupt
JDB> happened in '__sti()' (arch/ppc/kernel/misc.S), just after calling 'mtmsr'
JDB> to turn on the 'EE' bit.
JDB> I added a 'sync', between 'ori r3,r3,MSR_EE' and 'mtmsr r3',
JDB> and it has fixed the problem.
I'll try that, thanks.
JDB> My questions are:
JDB> - Did anybody encountered the same problem on that core?
yes.
JDB> - Did anybody seen something about that in the user's manual and/or the
JDB> errata?
no.
JDB> - Is my fix correct, and should it be brought to other calls to 'mtmsr' ?
I don't have the knowledge to answer that :-(
JDB> --------------------------------------------
JDB> Jean-Denis Boyer, B.Eng., System Architect
JDB> Mediatrix Telecom Inc.
JDB> 4229 Garlock Street
JDB> Sherbrooke (Québec)
JDB> J1L 2C8 CANADA
JDB> (819)829-8749 x241
JDB> --------------------------------------------
JDB> ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-09 14:53 Jean-Denis Boyer
2002-04-09 17:09 ` Ricardo Scop
@ 2002-04-09 18:26 ` Dan Malek
1 sibling, 0 replies; 13+ messages in thread
From: Dan Malek @ 2002-04-09 18:26 UTC (permalink / raw)
To: Jean-Denis Boyer; +Cc: linuxppc-embedded
Jean-Denis Boyer wrote:
> On kernel 2.4.10, at boot up, the following message was written to the
> console:
>
> Unhandled interrupt 0, disabled
How often did you see these? This message will indicate there was a
hardware interrupt posted to the processor but when we read the vector
register nothing was pending. Usually race conditions between devices
removing the interrupt signal and interrupts being enabled.
> This message did not appear on kernel 2.4.18 (I don't know why),
> but in /proc/interrupts, the number at the right of BAD was increasing
> slowly.
I'm not sure the processor specific irq function is properly returning a
status to the do_IRQ function. In this case, the spurious counter will
be incremented without any message. I do what the comment indicates,
return a -1 if no more pending, but I think there is a hole in the code
that can erroneously update the counter. It's no big deal, since it
functionally works fine.
> Putting traces in the interrupt handler, it appeared that the interrupt
> happened in '__sti()' (arch/ppc/kernel/misc.S), just after calling 'mtmsr'
> to turn on the 'EE' bit.
Just think about this for a minute.............If there was an interrupt
pending, why are you surprised it occurs as soon as you enable interrupts
in the MSR?
> I added a 'sync', between 'ori r3,r3,MSR_EE' and 'mtmsr r3',
> and it has fixed the problem.
A 'sync' or an 'isync'? The mtmsr is supposed to be an instruction
synchronizer, so if you required an 'isync' for proper operation then it
would be a silicon mask concern. If you really added a 'sync' instruction,
this implies to me there is some driver that isn't properly synchronizing
its state with a device. An operation to acknowledge the interrupt from
the driver is stuck in the pipeline, you enable the interrupts again,
the processor is handed an interrupt, the device is acknowledged (from
the pipeline), and in the interrupt handler we don't find anything.
I would be looking for a driver bug someplace.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-10 16:31 Jean-Denis Boyer
@ 2002-04-10 15:50 ` Dan Malek
0 siblings, 0 replies; 13+ messages in thread
From: Dan Malek @ 2002-04-10 15:50 UTC (permalink / raw)
To: Jean-Denis Boyer; +Cc: linuxppc-embedded
Jean-Denis Boyer wrote:
> IMHO, I prefer the way it is handled in 2.4.18.
Ahhhh.....I see now. It took some time to get the _devel
updates into the stable tree. The spurious interrupt is
an artifact of the 8260 code not updated to match Ben's
generic PPC changes.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 16:31 Jean-Denis Boyer
2002-04-10 15:50 ` Dan Malek
0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 16:31 UTC (permalink / raw)
To: 'Dan Malek'; +Cc: linuxppc-embedded
Dan,
> > Unhandled interrupt 0, disabled
>
> How often did you see these? This message will indicate there was a
> hardware interrupt posted to the processor but when we read the vector
> register nothing was pending. Usually race conditions between devices
> removing the interrupt signal and interrupts being enabled.
On 2.4.10 (and 2.4.16 as someone wrote on this list), this message
appears only once. But then it is flagged as 'DISABLED', and masked
(but this one is non-maskable), and the message is no longer displayed.
In /proc/interrupts, the number beside BAD is 1 and stay.
But the spurious interrupt continue to pop. If I remove the 'DISABLED'
flag, the message is written many times (I see them by executing 'dmesg').
In /proc/interrupts, the number beside BAD grows accordingly.
On 2.4.18, the message never appears, but
in /proc/interrupts, the number beside BAD grows in the same manner.
REASON:
I looked at the patches, and this is because between 2.4.17 and 2.4.18,
in the function m8260_get_irq (arch/ppc/kernel/ppc8260_pic.c),
an 'if(irq == 0) return -1;' has been added.
And so, 'do_IRQ' increments the spurious counter without calling
'ppc_irq_dispatch_handler', the function that was generating the warning.
IMHO, I prefer the way it is handled in 2.4.18.
> > Putting traces in the interrupt handler, it appeared that
> the interrupt
> > happened in '__sti()' (arch/ppc/kernel/misc.S), just after
> calling 'mtmsr'
> > to turn on the 'EE' bit.
>
> Just think about this for a minute.............If there was
> an interrupt
> pending, why are you surprised it occurs as soon as you
> enable interrupts
> in the MSR?
I wouldn't be surprised, if there were REALLY a pending interrupt.
I put the trace only when the vector index is 0 (error, or no interrupt).
The NIP was always the same, exactly after the instruction 'mtmsr'.
And the link register pointed inside the 'ppc_irq_dispatch_handler'.
> A 'sync' or an 'isync'? The mtmsr is supposed to be an instruction
> synchronizer, so if you required an 'isync' for proper
> operation then it
> would be a silicon mask concern. If you really added a
> 'sync' instruction,
> this implies to me there is some driver that isn't properly
> synchronizing
> its state with a device. An operation to acknowledge the
> interrupt from
> the driver is stuck in the pipeline, you enable the interrupts again,
> the processor is handed an interrupt, the device is acknowledged (from
> the pipeline), and in the interrupt handler we don't find anything.
>
> I would be looking for a driver bug someplace.
I've put a 'sync'. An 'isync' does not change anything.
I agree with you, according to the 603e documentation,
the 'mtmsr' is an 'execution synchronizing' instruction.
What is even more strange, I can put the 'sync' everywhere in the '__sti'
function, that is before the 'mfmsr', before the 'ori', or before the
'mtmsr',
and the problem of spurious interrupt simply disappear. Remove it, it
reappears.
I could reproduce it using 2 interrupt sources: uart and fenet.
As I said, '__sti' is called by 'ppc_irq_dispatch_handler',
so I think we would see the problem whatever is the irq source.
If it is a pipeline concern, the instructions that are responsible of
the irq acknowledge would be neer the 'mtmsr', am I wrong?
I could not demonstrate that.
Hard to explain...
--------------------------------------------
Jean-Denis Boyer, B.Eng., System Architect
Mediatrix Telecom Inc.
4229 Garlock Street
Sherbrooke (Québec)
J1L 2C8 CANADA
(819)829-8749 x241
--------------------------------------------
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-10 17:16 Re[2]: 8260 - Spurious interrupt when calling __sti() Jean-Denis Boyer
@ 2002-04-10 17:04 ` Dan Malek
0 siblings, 0 replies; 13+ messages in thread
From: Dan Malek @ 2002-04-10 17:04 UTC (permalink / raw)
To: Jean-Denis Boyer; +Cc: 'Ricardo Scop', linuxppc-embedded
Jean-Denis Boyer wrote:
> At the left of the comment is 'SYNC', in uppercase.
That's correct. This macro is empty for the 8260.
This is needed for those processors that don't properly 'isync'
the mtmsr instruction, which is usually engineering sample silicon
or the 601.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Re[2]: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 17:16 Jean-Denis Boyer
2002-04-10 17:04 ` Dan Malek
0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 17:16 UTC (permalink / raw)
To: 'Ricardo Scop'; +Cc: 'Dan Malek', linuxppc-embedded
Ricardo,
> Even harder to explain is why you're having to put those syncs in
> __sti, since both my copies of 2.4.16 and 2.4.18-pre9 kernel sources
> already have them in the code... and with the comment
> "/* Some chip revs have problems here... */" !!!
At the left of the comment is 'SYNC', in uppercase.
According to "include/asm-ppc/ppc_asm.h", which defines the preprocessor
macro 'SYNC', it means something only if CONFIG_PPC601_SYNC_FIX
is defined. The comment seems to be old and imprecise.
After compiling the kernel, I disassembled it, just to be sure... ;-)
--------------------------------------------
Jean-Denis Boyer, B.Eng., System Architect
Mediatrix Telecom Inc.
4229 Garlock Street
Sherbrooke (Québec)
J1L 2C8 CANADA
(819)829-8749 x241
--------------------------------------------
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 18:30 Jean-Denis Boyer
2002-04-10 18:33 ` Dan Malek
0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 18:30 UTC (permalink / raw)
To: 'Dan Malek'; +Cc: linuxppc-embedded
Dan,
> Ahhhh.....I see now. It took some time to get the _devel
> updates into the stable tree. The spurious interrupt is
> an artifact of the 8260 code not updated to match Ben's
> generic PPC changes.
Doh! Yes, sorry. I was referring to the stable tree.
As I can see by comparing with the _devel tree,
for the 2.4.18 release,
is that "ppc8260_pic.c" was correctly updated,
but not "irq.c". The 'while' loop is absent.
However, this apparent 'out of sync' has nothing to do
with the fact that the invalid interrupt pops up, right ?
--------------------------------------------
Jean-Denis Boyer, B.Eng., System Architect
Mediatrix Telecom Inc.
4229 Garlock Street
Sherbrooke (Québec)
J1L 2C8 CANADA
(819)829-8749 x241
--------------------------------------------
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-10 18:30 Jean-Denis Boyer
@ 2002-04-10 18:33 ` Dan Malek
2002-04-10 20:24 ` Ron Bianco
0 siblings, 1 reply; 13+ messages in thread
From: Dan Malek @ 2002-04-10 18:33 UTC (permalink / raw)
To: Jean-Denis Boyer; +Cc: linuxppc-embedded
Jean-Denis Boyer wrote:
> However, this apparent 'out of sync' has nothing to do
> with the fact that the invalid interrupt pops up, right ?
If it happens only once at boot, there must be some device
with a dangling interrupt not properly initialized.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: 8260 - Spurious interrupt when calling __sti()
2002-04-10 18:33 ` Dan Malek
@ 2002-04-10 20:24 ` Ron Bianco
2002-04-10 21:25 ` Dan Malek
0 siblings, 1 reply; 13+ messages in thread
From: Ron Bianco @ 2002-04-10 20:24 UTC (permalink / raw)
To: linuxppc-embedded
For OpenPIC/EPIC hardware:
I don't think this subtracts from the validity of the points made in this
discussion so far, but not making sure of this did give us grief with 'initial'
spurious interrupts for a while on a pre- 2.4.x port for the 8240.
It is important in the bootrom code that the OpenPIC be reset by software before
setting mode and enabling.
For the 8240 this is done by writing 0xA0000000 to Global Configuration register
0.
There are a few other required steps after that too.
We had this code in our kernel for a while:
#if 0 // don't need this, it is done by boot loader now
openpic_write(&OpenPIC->Global.Global_Configuration0, 0xa0000000); /* GCR -
reset epic */
openpic_write(&OpenPIC->Global.Global_Configuration0, 0x20000000); /* GCR -
mixed mode */
t = openpic_read(&OpenPIC->Global.Global_Configuration1);
openpic_write(&OpenPIC->Global.Global_Configuration1, t & 0xf7ffffff); /*
EICR - direct */
while ( openpic_readfield(&OpenPIC->THIS_CPU.Interrupt_Acknowledge,
OPENPIC_VECTOR_MASK) != 0x000000ff);
#endif
I'm not yet familiar with the scope of open pic initialization in the 2.4.x
kernels.
But somewhere during startup this procedure is necessary to avoid the one time,
initial spurious int. problem.
If this is old news to anyone, then excuse redunancy.
Ron
> -----Original Message-----
> From: owner-linuxppc-embedded@lists.linuxppc.org
> [mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Dan
> Malek
> Sent: Wednesday, April 10, 2002 11:34 AM
> To: Jean-Denis Boyer
> Cc: linuxppc-embedded@lists.linuxppc.org
> Subject: Re: 8260 - Spurious interrupt when calling __sti()
>
>
>
> Jean-Denis Boyer wrote:
>
>
> > However, this apparent 'out of sync' has nothing to do
> > with the fact that the invalid interrupt pops up, right ?
>
> If it happens only once at boot, there must be some device
> with a dangling interrupt not properly initialized.
>
>
> -- Dan
>
>
>
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-10 20:24 ` Ron Bianco
@ 2002-04-10 21:25 ` Dan Malek
2002-04-15 18:33 ` Val Henson
0 siblings, 1 reply; 13+ messages in thread
From: Dan Malek @ 2002-04-10 21:25 UTC (permalink / raw)
To: Ron Bianco; +Cc: linuxppc-embedded
Ron Bianco wrote:
> For OpenPIC/EPIC hardware:
The 8260 family doesn't use EPIC. It has a custom integrated controller.
Your point is taken that in general someone has to ensure devices and
interrupt controllers are properly initialized. I thought I did this
in the kernel, but perhaps something was missed. Sometimes, a boot rom
can leave something in a weird state (like the I2C for reading an EEPROM)
that the kernel doesn't always use.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-10 21:25 ` Dan Malek
@ 2002-04-15 18:33 ` Val Henson
2002-04-16 18:58 ` Dan Malek
0 siblings, 1 reply; 13+ messages in thread
From: Val Henson @ 2002-04-15 18:33 UTC (permalink / raw)
To: Dan Malek; +Cc: linuxppc-embedded
On Wed, Apr 10, 2002 at 05:25:21PM -0400, Dan Malek wrote:
>
> Ron Bianco wrote:
>
> > For OpenPIC/EPIC hardware:
>
> The 8260 family doesn't use EPIC. It has a custom integrated controller.
And here I thought we were the only PPC embedded board to go that
route. I feel better.
-VAL
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 8260 - Spurious interrupt when calling __sti()
2002-04-15 18:33 ` Val Henson
@ 2002-04-16 18:58 ` Dan Malek
0 siblings, 0 replies; 13+ messages in thread
From: Dan Malek @ 2002-04-16 18:58 UTC (permalink / raw)
To: Val Henson; +Cc: linuxppc-embedded
Val Henson wrote:
> And here I thought we were the only PPC embedded board to go that
> route. I feel better.
It doesn't have anything to do with the board, it's the design of
the integrated peripherals with the CPU core.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2002-04-16 18:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-10 17:16 Re[2]: 8260 - Spurious interrupt when calling __sti() Jean-Denis Boyer
2002-04-10 17:04 ` Dan Malek
-- strict thread matches above, loose matches on Subject: below --
2002-04-10 18:30 Jean-Denis Boyer
2002-04-10 18:33 ` Dan Malek
2002-04-10 20:24 ` Ron Bianco
2002-04-10 21:25 ` Dan Malek
2002-04-15 18:33 ` Val Henson
2002-04-16 18:58 ` Dan Malek
2002-04-10 16:31 Jean-Denis Boyer
2002-04-10 15:50 ` Dan Malek
2002-04-09 14:53 Jean-Denis Boyer
2002-04-09 17:09 ` Ricardo Scop
2002-04-09 18:26 ` Dan Malek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).