From: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
To: Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: "devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Ruslan Ruslichenko
<rruslich-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>,
"linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
kernel-F5mvAk5X5gdBDgjK7y7TUQ@public.gmane.org,
Sean Young <sean-hENCXIMQXOg@public.gmane.org>,
wfg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org,
Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Mauro Carvalho Chehab
<mchehab-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Linux LED Subsystem
<linux-leds-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-input-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-input-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
kernel test robot
<fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
LKP <lkp-JC7UmRfGjtg@public.gmane.org>,
"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
Linux Media Mailing List
<linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
Date: Mon, 27 Feb 2017 17:12:16 +0100 (CET) [thread overview]
Message-ID: <alpine.DEB.2.20.1702271647570.4732@nanos> (raw)
In-Reply-To: <20170227154124.GA20569-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Mon, 27 Feb 2017, Ingo Molnar wrote:
> * Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
>
> > The pending interrupt issue happens, at least on my test boxen, mostly on
> > the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> > happen occasionally.
> >
> >
> > - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
> > one of the affected machines this stops when the interrupt system is
> > switched to interrupt remapping !?!?!?
> >
> > - Spurious interrupts on various interrupt lines, which are triggered by
> > IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
> > that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
> > triggered when the IOAPIC >=16 interrupt fires.
> >
> > - Spurious interrupt caused by driver probing itself. I.e. the driver
> > probing code causes an interrupt issued from the device
> > inadvertently. That happens even on IRQ >= 16.
> >
> > This problem might be handled by the device driver code itself, but
> > that's going to be ugly. See below.
>
> That's pretty colorful behavior...
>
> > We can try to sample more data from the machines of affected users, but I doubt
> > that it will give us more information than confirming that we really have to
> > deal with all that hardware wreckage out there in some way or the other.
>
> BTW., instead of trying to avoid the scenario, wow about moving in the other
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core code
> starting from v4.12 or so, i.e. requiring device driver IRQ handlers to handle the
> invocation of IRQ handlers at pretty much any moment. (We could also extend it a
> bit, such as invoking IRQ handlers early after suspend/resume wakeup.)
>
> Because it's not the requirement that hurts primarily, but the resulting
> non-determinism and the sporadic crashes. Which can be solved by making the race
> deterministic via the debug facility.
>
> If the IRQ handler crashed the moment it was first written by the driver author
> we'd never see these problems.
Yes, I'd love to do that. That's just a nightmare as well.
See commit 6d83f94db95cf, which added the _FIXME suffix to that code.
So recently I tried to invoke the primary handler, which causes another
issue:
Some of the low level code (e.g. IOAPIC interrupt migration, but also
some PPC irq chip machinery) depends on being called in hard interrupt
context. They invoke get_irq_regs(), which obviously does not work from
thread context.
So I removed that one from -next as well and postponed it another time. And
I should have known before I tried it that it does not work. Simply because
of that stuff x86 cannot use the software based resend mechanism.
Still trying to wrap my head around a proper solution for the problem. On
x86 we might just check whether we are really in hard irq context and
otherwise skip the part which depends on get_irq_regs(). That would be a
sane thing to do. Have not yet looked at the PPC side of affairs, whether
that's easy to solve as well. But even if it is, then there might be still
other magic code in some irq chip drivers which relies on things which are
only available/correct when actually invoked by a hardware interrupt.
Not only the hardware has colorful behaviour ....
Thanks,
tglx
next prev parent reply other threads:[~2017-02-27 16:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-24 18:28 [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c kernel test robot
2017-02-24 19:15 ` Linus Torvalds
2017-02-25 9:07 ` Ingo Molnar
2017-02-25 18:02 ` Linus Torvalds
[not found] ` <CA+55aFy+ER8cYV02eZsKAOLnZBWY96zNWqUFWSWT1+3sZD4XnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-27 10:09 ` Thomas Gleixner
2017-02-27 12:32 ` Thomas Gleixner
2017-02-27 15:41 ` Ingo Molnar
2017-02-27 16:07 ` Tony Lindgren
[not found] ` <20170227160750.GM21809-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org>
2017-02-27 16:18 ` Thomas Gleixner
2017-02-27 16:26 ` Tony Lindgren
[not found] ` <20170227154124.GA20569-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-27 16:12 ` Thomas Gleixner [this message]
2017-02-27 19:23 ` Linus Torvalds
[not found] ` <CA+55aFxwtkOs95R-v7z8yjguvp91oDTxRKs-x3uN_=sM_33Gvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-28 7:25 ` Ingo Molnar
2017-02-28 10:51 ` Thomas Gleixner
2017-02-25 11:14 ` Sean Young
2017-02-25 11:28 ` [PATCH] [media] serial_ir: ensure we're ready to receive interrupts Sean Young
[not found] ` <20170225112816.GA7981-3XSxi2G4b3iXFJAUJl40Xg@public.gmane.org>
2017-02-25 13:34 ` Mauro Carvalho Chehab
[not found] ` <20170225103437.58c5a199-ch4gOOMV7nf/PtFMR13I2A@public.gmane.org>
2017-02-25 13:54 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.20.1702271647570.4732@nanos \
--to=tglx-hfztesqfncyowbw4kg4ksq@public.gmane.org \
--cc=devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=kernel-F5mvAk5X5gdBDgjK7y7TUQ@public.gmane.org \
--cc=linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-input-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-leds-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lkp-JC7UmRfGjtg@public.gmane.org \
--cc=mchehab-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=rruslich-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
--cc=sean-hENCXIMQXOg@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=wfg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox