linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* RE: Re[2]: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 17:16 Jean-Denis Boyer
  2002-04-10 17:04 ` Dan Malek
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 17:16 UTC (permalink / raw)
  To: 'Ricardo Scop'; +Cc: 'Dan Malek', linuxppc-embedded


Ricardo,

> Even harder to explain is why you're having to put those syncs in
> __sti, since both my copies of 2.4.16 and 2.4.18-pre9 kernel sources
> already have them in the code... and with the comment
> "/* Some chip revs have problems here... */" !!!

At the left of the comment is 'SYNC', in uppercase.
According to "include/asm-ppc/ppc_asm.h", which defines the preprocessor
macro 'SYNC', it means something only if CONFIG_PPC601_SYNC_FIX
is defined. The comment seems to be old and imprecise.
After compiling the kernel, I disassembled it, just to be sure... ;-)


--------------------------------------------
 Jean-Denis Boyer, B.Eng., System Architect
 Mediatrix Telecom Inc.
 4229 Garlock Street
 Sherbrooke (Québec)
 J1L 2C8  CANADA
 (819)829-8749 x241
--------------------------------------------


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 18:30 Jean-Denis Boyer
  2002-04-10 18:33 ` Dan Malek
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 18:30 UTC (permalink / raw)
  To: 'Dan Malek'; +Cc: linuxppc-embedded


Dan,

> Ahhhh.....I see now.  It took some time to get the _devel
> updates into the stable tree.  The spurious interrupt is
> an artifact of the 8260 code not updated to match Ben's
> generic PPC changes.

Doh! Yes, sorry. I was referring to the stable tree.
As I can see by comparing with the _devel tree,
for the 2.4.18 release,
is that "ppc8260_pic.c" was correctly updated,
but not "irq.c". The 'while' loop is absent.

However, this apparent 'out of sync' has nothing to do
with the fact that the invalid interrupt pops up, right ?


--------------------------------------------
 Jean-Denis Boyer, B.Eng., System Architect
 Mediatrix Telecom Inc.
 4229 Garlock Street
 Sherbrooke (Québec)
 J1L 2C8  CANADA
 (819)829-8749 x241
--------------------------------------------

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: 8260 - Spurious interrupt when calling __sti()
@ 2002-04-10 16:31 Jean-Denis Boyer
  2002-04-10 15:50 ` Dan Malek
  0 siblings, 1 reply; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-10 16:31 UTC (permalink / raw)
  To: 'Dan Malek'; +Cc: linuxppc-embedded


Dan,

> >   Unhandled interrupt 0, disabled
>
> How often did you see these?  This message will indicate there was a
> hardware interrupt posted to the processor but when we read the vector
> register nothing was pending.  Usually race conditions between devices
> removing the interrupt signal and interrupts being enabled.

On 2.4.10 (and 2.4.16 as someone wrote on this list), this message
appears only once. But then it is flagged as 'DISABLED', and masked
(but this one is non-maskable), and the message is no longer displayed.
In /proc/interrupts, the number beside BAD is 1 and stay.

But the spurious interrupt continue to pop. If I remove the 'DISABLED'
flag, the message is written many times (I see them by executing 'dmesg').
In /proc/interrupts, the number beside BAD grows accordingly.

On 2.4.18, the message never appears, but
in /proc/interrupts, the number beside BAD grows in the same manner.

REASON:
I looked at the patches, and this is because between 2.4.17 and 2.4.18,
in the function m8260_get_irq (arch/ppc/kernel/ppc8260_pic.c),
an 'if(irq == 0) return -1;' has been added.
And so, 'do_IRQ' increments the spurious counter without calling
'ppc_irq_dispatch_handler', the function that was generating the warning.

IMHO, I prefer the way it is handled in 2.4.18.


> > Putting traces in the interrupt handler, it appeared that
> the interrupt
> > happened in '__sti()' (arch/ppc/kernel/misc.S), just after
> calling 'mtmsr'
> > to turn on the 'EE' bit.
>
> Just think about this for a minute.............If there was
> an interrupt
> pending, why are you surprised it occurs as soon as you
> enable interrupts
> in the MSR?

I wouldn't be surprised, if there were REALLY a pending interrupt.
I put the trace only when the vector index is 0 (error, or no interrupt).
The NIP was always the same, exactly after the instruction 'mtmsr'.
And the link register pointed inside the 'ppc_irq_dispatch_handler'.


> A 'sync' or an 'isync'?  The mtmsr is supposed to be an instruction
> synchronizer, so if you required an 'isync' for proper
> operation then it
> would be a silicon mask concern.  If you really added a
> 'sync' instruction,
> this implies to me there is some driver that isn't properly
> synchronizing
> its state with a device.  An operation to acknowledge the
> interrupt from
> the driver is stuck in the pipeline, you enable the interrupts again,
> the processor is handed an interrupt, the device is acknowledged (from
> the pipeline), and in the interrupt handler we don't find anything.
>
> I would be looking for a driver bug someplace.

I've put a 'sync'. An 'isync' does not change anything.
I agree with you, according to the 603e documentation,
the 'mtmsr' is an 'execution synchronizing' instruction.

What is even more strange, I can put the 'sync' everywhere in the '__sti'
function, that is before the 'mfmsr', before the 'ori', or before the
'mtmsr',
and the problem of spurious interrupt simply disappear. Remove it, it
reappears.

I could reproduce it using 2 interrupt sources: uart and fenet.
As I said, '__sti' is called by 'ppc_irq_dispatch_handler',
so I think we would see the problem whatever is the irq source.

If it is a pipeline concern, the instructions that are responsible of
the irq acknowledge would be neer the 'mtmsr', am I wrong?
I could not demonstrate that.

Hard to explain...

--------------------------------------------
 Jean-Denis Boyer, B.Eng., System Architect
 Mediatrix Telecom Inc.
 4229 Garlock Street
 Sherbrooke (Québec)
 J1L 2C8  CANADA
 (819)829-8749 x241
--------------------------------------------

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread
* 8260 - Spurious interrupt when calling __sti()
@ 2002-04-09 14:53 Jean-Denis Boyer
  2002-04-09 17:09 ` Ricardo Scop
  2002-04-09 18:26 ` Dan Malek
  0 siblings, 2 replies; 13+ messages in thread
From: Jean-Denis Boyer @ 2002-04-09 14:53 UTC (permalink / raw)
  To: linuxppc-embedded


I have a custom board that uses an 8260 (rev. A.1 1K22A).
We've had for a long time a problem of spurious interrupt.
On kernel 2.4.10, at boot up, the following message was written to the
console:

  Unhandled interrupt 0, disabled

This message did not appear on kernel 2.4.18 (I don't know why),
but in /proc/interrupts, the number at the right of BAD was increasing
slowly.

When flooding the board through the network, thus generating a lot of
interrupts,
the number of BAD increased faster (but not as fast as the fenet).

[root@10.20.125.254 /]# cat /proc/interrupts
           CPU0
  4:        290   8260 SIU   Edge      uart
 33:     192388   8260 SIU   Edge      fenet
BAD:        738

Putting traces in the interrupt handler, it appeared that the interrupt
happened in '__sti()' (arch/ppc/kernel/misc.S), just after calling 'mtmsr'
to turn on the 'EE' bit.

I added a 'sync', between 'ori r3,r3,MSR_EE' and 'mtmsr r3',
and it has fixed the problem.

My questions are:
 - Did anybody encountered the same problem on that core?
 - Did anybody seen something about that in the user's manual and/or the
errata?
 - Is my fix correct, and should it be brought to other calls to 'mtmsr' ?



--------------------------------------------
 Jean-Denis Boyer, B.Eng., System Architect
 Mediatrix Telecom Inc.
 4229 Garlock Street
 Sherbrooke (Québec)
 J1L 2C8  CANADA
 (819)829-8749 x241
--------------------------------------------

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-04-16 18:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-10 17:16 Re[2]: 8260 - Spurious interrupt when calling __sti() Jean-Denis Boyer
2002-04-10 17:04 ` Dan Malek
  -- strict thread matches above, loose matches on Subject: below --
2002-04-10 18:30 Jean-Denis Boyer
2002-04-10 18:33 ` Dan Malek
2002-04-10 20:24   ` Ron Bianco
2002-04-10 21:25     ` Dan Malek
2002-04-15 18:33       ` Val Henson
2002-04-16 18:58         ` Dan Malek
2002-04-10 16:31 Jean-Denis Boyer
2002-04-10 15:50 ` Dan Malek
2002-04-09 14:53 Jean-Denis Boyer
2002-04-09 17:09 ` Ricardo Scop
2002-04-09 18:26 ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).