From: Segher Boessenkool <segher@kernel.crashing.org>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: linuxppc-dev@ozlabs.org, paulus@samba.org,
Yoder Stuart-B08248 <stuart.yoder@freescale.com>
Subject: Re: [PATCH] powerpc: document new interrupt-array property
Date: Tue, 27 Feb 2007 12:49:26 +0100 [thread overview]
Message-ID: <e974c705d1088b60aa021856b8fb95a6@kernel.crashing.org> (raw)
In-Reply-To: <20070227034541.GD1861@localhost.localdomain>
> Segher, think for a moment instead of just arguing.
I do that already, thank you. I just don't look at
the problem from the same angle as you it seems.
> There just isn't
> enough information available for the kernel to do sanity checking when
> there is an apparently valid 'interrupts' property. Consider:
It can't do a 100% job (of course; if it could it wouldn't
need a device tree at all since it would be omniscient), but
it still can do quite a reasonable job.
> Interrupt controllers are generally initialized with all interrupts
> masked (yes, not always, but usually).
Yes, and one of the first things I do when debugging interrupt
stuff is this turn all interrupts on since this makes debugging
much much easier.
> So, if a client mistakenly
> believes a device has no interrupts, those interrupts will never be
> configured, and the CPU will never see those interrupts. This is only
> going to cause a problem if there is an active driver which is
> expecting interrupts. But if there's a driver expecting interrupts,
> it must at some point earlier have attempted to configure the
> interrupts (if the client is the kernel, that's a request_irq()). In
> order to configure the interrupt, it would have parsed the device tree
> to find data about the interrupt. In doing so it would have run into
> the lack of 'interrupts' property.
Yes, give or take some details / ordering of events.
> There's a good chance at this point it will just print an error saying
> "Huh? Where's my interrupt" and abort driver initialization. If it
> doesn't do that, it's very likely it will immediately crash attempting
> to dereference or parse the non-existant property.
That would be a plain simple bug.
> Either way, the
> problem shows up at the point we're attempting to parse the interrupt
> tree, and will be pretty easy to debug.
Arguably the whole interrupt tree should be parsed at kernel
start, not at request_irq() time.
And this isn't the end of the story; the kernel won't say
"huh where's my IRQ" but it will try a few more options
first (at least on PCI, it can be true on other buses as
well in principle) and quite likely it will return some
bad number.
> Now, a different case. Suppose we're using the 'interrupts' /
> 'interrupt-parents' approach. We have a board with two identical
> interrupt controllers, cascaded. It has a network device with two
> interrupts, the first is end-of-packet and is routed to the top-level
> IC, the second signals a fairly rare error condition and is routed to
> the cascaded IC. The network device sits under a bridge which has a
> single interrupt routed to the primary IC (and thus has an
> 'interrupt-parent' property). So, to an old-style parser it looks
> like the network device has two interrupts on the primary controller,
> routed via the bridge.
>
> When the network driver initializes, it requests its irqs, correctly
> configures the first, and misconfigures the second (because it follows
> the interrupt tree old-style and assumes they're all routed to the
> primary IC).
And very likely ends up with a conflict on that second interrupt
since some other device uses it as well. Stuff will complain
at initialisation time still and all is fine.
> It sends and receives packets fine, then the error
> condition happens, but the recovery ISR is never called and the
> network suddenly stops at some random time after startup. Programmer,
> baffled, tries half-a-dozen theories before noticing the error status
> bit and going "but why didn't we get an interrupt?".
That would be the programmer who programmed the device tree
(unless he doesn't test his work) so I can't see why he would
be baffled.
Also this is why I like to have all interrupts enabled
on all controllers, you notice the bunch you forgot
about. If the argument for doing the opposite is "but
it gets really noisy if some unclaimed interrupts happen",
well, that is a very serious bug, right? Just like you
don't want random DMAs happen, you don't want random
interrupts flying around either.
> Or suppose the second interrupt signals a (fairly unimportant) status
> change, level-sensitive. The network driver works just fine. Then
> along comes another driver that shares an interrupt with the second
> network driver interrupt.
It cannot share that interrupt unless the network driver (and
the other device's driver) say it can be shared. And they
should not, it's level triggered after all.
> It crashes with an unhandled interrupt on
> startup if-and-only-if the network driver has had a status change
> event before the second driver started. This is common on some
> networks and rare on others. Bafflement all around...
>
> Or for that matter, the network driver could crash with an unhandled
> interrupt when the device which is really using what the network
> driver thinks is its second irq, generates an interrupt. When that
> happens could depend on that other device, its driver, the board
> configuration, then network or other external environment...
Yeah, funky interrupt problems are a bitch to resolve,
aren't they. But the interrupt can't be shared so this
case cannot happen either.
> And those are just the first 3 recipes for utter confusion I can come
> up.
Oh, I can think of many more, and I've seen most of them.
Now back to the meat of the matter:
Whenever you're writing a device tree, after every small
change to the tree you should check it for validity (by
hand or some checking tool), and see if it works on all
kernel versions you claim to support too (not quite the
same thing, heh). If things go wrong you know what change
caused it.
Some other developer (who didn't change the device tree)
can't run into problems, unless he uses an unsupported
kernel version. That's his own fault :-)
And, lastly, the most important point that you conveniently
snipped off on your reply:
>> Anyway, it's all a question of deployment: you just
>> have to make sure your users use a new enough kernel
>> with your device tree (and device), which you have
>> to do *anyway*. Also DTS files are still conveniently
>> shipped with the kernel so it's even less of a problem.
If you care about compatibility to random kernel versions
instead, you'd better not do surgery on the interrupt
mapping binding at all but just put an extra interrupt
nexus in your device tree.
Segher
next prev parent reply other threads:[~2007-02-27 11:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-21 23:25 [PATCH] powerpc: document new interrupt-array property Stuart Yoder
2007-02-22 0:29 ` Kumar Gala
2007-02-22 1:18 ` David Gibson
2007-02-22 7:01 ` Segher Boessenkool
2007-02-22 10:34 ` David Gibson
2007-02-22 11:06 ` Segher Boessenkool
2007-02-22 15:47 ` Yoder Stuart-B08248
2007-02-22 17:09 ` Segher Boessenkool
2007-02-23 19:15 ` Yoder Stuart-B08248
2007-02-23 21:30 ` Segher Boessenkool
2007-02-23 21:57 ` Yoder Stuart-B08248
2007-02-23 22:30 ` Segher Boessenkool
2007-02-24 6:42 ` Benjamin Herrenschmidt
2007-02-24 6:40 ` Benjamin Herrenschmidt
2007-02-24 11:24 ` Segher Boessenkool
2007-02-26 4:16 ` David Gibson
2007-02-26 5:36 ` Segher Boessenkool
2007-02-26 13:08 ` David Gibson
2007-02-26 14:26 ` Segher Boessenkool
2007-02-27 2:32 ` David Gibson
2007-02-27 2:52 ` Segher Boessenkool
2007-02-27 3:45 ` David Gibson
2007-02-27 11:49 ` Segher Boessenkool [this message]
2007-02-28 0:40 ` David Gibson
2007-02-28 1:00 ` Segher Boessenkool
2007-02-28 6:40 ` Benjamin Herrenschmidt
2007-02-26 16:53 ` Yoder Stuart-B08248
2007-02-22 22:57 ` David Gibson
2007-02-23 0:07 ` Segher Boessenkool
2007-02-23 0:33 ` David Gibson
2007-02-23 0:50 ` Segher Boessenkool
2007-02-23 16:07 ` Yoder Stuart-B08248
2007-02-23 16:14 ` Kumar Gala
2007-02-23 17:00 ` Segher Boessenkool
2007-02-23 16:55 ` Segher Boessenkool
2007-02-23 17:01 ` Yoder Stuart-B08248
2007-02-23 17:51 ` Segher Boessenkool
2007-02-22 22:48 ` David Gibson
2007-02-23 0:25 ` Segher Boessenkool
2007-02-24 6:30 ` Benjamin Herrenschmidt
2007-02-24 11:16 ` Segher Boessenkool
2007-02-22 7:19 ` Segher Boessenkool
2007-02-24 6:35 ` Benjamin Herrenschmidt
2007-02-24 11:11 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e974c705d1088b60aa021856b8fb95a6@kernel.crashing.org \
--to=segher@kernel.crashing.org \
--cc=david@gibson.dropbear.id.au \
--cc=linuxppc-dev@ozlabs.org \
--cc=paulus@samba.org \
--cc=stuart.yoder@freescale.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).