Linux PARISC architecture development
 help / color / mirror / Atom feed
* Longstanding bug in our IRQ code (irqbalance HPMCs parisc SMP machines)
@ 2010-04-24 13:29 Thibaut VARÈNE
  2010-04-24 14:36 ` Grant Grundler
  0 siblings, 1 reply; 6+ messages in thread
From: Thibaut VARÈNE @ 2010-04-24 13:29 UTC (permalink / raw)
  To: linux-parisc

Pa-ckers,

Just for the records, I'd like to raise some attention to what seems =20
like a pretty old bug in our IRQ code that is apparently still =20
affecting us.

Long story short: while trying to figure out why the recently attached =
=20
10-disk bay was killing the Debian "lafayette" autobuilder during raid =
=20
resync, I noticed that irqbalance was part of the default Debian =20
autobuilder setup.

The nastiness of irqbalance has been discussed before, and I =20
remembered having had issues in the past (5+ years ago) on my parisc =20
machines with that daemon. I couldn't find a pointer to a m-l thread, =20
I don't remember if I discussed that on IRC or elsewhere.

Anyway, turned out disabling irqbalance "fixed" the crash (and by =20
crash I mean HPMC). IIRC, the general idea is that when irqbalance =20
reroutes IRQ under heavy interrupt load, a race occurs by which one =20
interrupt request might end up delivered to the wrong CPU, HPMC'ing =20
the machine.

I have no particular opinion on whether it should be expected that =20
something as stupid as irqbalance could crash a system, but others =20
seem to believe it shouldn't (claiming "it works on *real* [read: x86] =
=20
hardware").

Now, I'm quite convinced that irqbalance could be one of the (major?) =20
cause of instability of the parisc autobuilders. AFAIU, they've =20
decided to disable it on their setup, maybe the situation will improve =
=20
there. Still, irqbalance is only the messenger, and I'm wondering =20
whether that apparent bug in our IRQ code could also be responsible =20
for other issues we're still having.

It's been a very long time since I last touched that code, and tbh I =20
never fully mastered it anyway, but I thought it'd be a good thing to =20
have a trace that this bug is still there, and maybe it will ring a =20
bell to others...

HTH

T-Bone

--=20
Thibaut Var=E8ne
http://www.parisc-linux.org/~varenet/--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-24 16:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-24 13:29 Longstanding bug in our IRQ code (irqbalance HPMCs parisc SMP machines) Thibaut VARÈNE
2010-04-24 14:36 ` Grant Grundler
2010-04-24 14:44   ` Thibaut VARÈNE
2010-04-24 15:13     ` Grant Grundler
2010-04-24 15:48     ` John David Anglin
2010-04-24 16:44       ` PTE/TLB issues (was Re: Longstanding bug in our IRQ code (irqbalance HPMCs parisc SMP machines)) Thibaut VARÈNE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox