[WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
       [not found] <58b07b30.9XFLj9Hhl7F6HMc2%fengguang.wu@intel.com>
@ 2017-02-24 19:15 ` Linus Torvalds
  2017-02-25  9:07   ` Ingo Molnar
  2017-02-25 11:14   ` Sean Young
  0 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2017-02-24 19:15 UTC (permalink / raw)
  To: linux-arm-kernel

Added more relevant people. I've debugged the immediate problem below,
but I think there's another problem that actually triggered this.

On Fri, Feb 24, 2017 at 10:28 AM, kernel test robot
<fengguang.wu@intel.com> wrote:
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> commit ff58d005cd10fcd372787cceac547e11cf706ff6
> Merge: 5ab3566 9eeb0ed
>
>     Merge tag 'media/v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
[...]
> [    4.664940] rc rc0: lirc_dev: driver ir-lirc-codec (rc-loopback) registered at minor = 0
> [    4.666322] BUG: unable to handle kernel NULL pointer dereference at 0000039c
> [    4.666675] IP: serial_ir_irq_handler+0x189/0x410

This merge being fingered ends up being a subtle interaction with other changes.

Those "other changes" are (again) the interrupt retrigger code that
was reverted for 4.10, and then we tried to merge them again this
merge window.

Because the immediate cause is:

> [    4.666675] EIP: serial_ir_irq_handler+0x189/0x410
> [    4.666675] Call Trace:
> [    4.666675]  <IRQ>
> [    4.666675]  __handle_irq_event_percpu+0x57/0x100
> [    4.666675]  handle_irq_event_percpu+0x1d/0x50
> [    4.666675]  handle_irq_event+0x32/0x60
> [    4.666675]  handle_edge_irq+0xa5/0x120
> [    4.666675]  handle_irq+0x9d/0xd0
> [    4.666675]  </IRQ>
> [    4.666675]  do_IRQ+0x5f/0x130
> [    4.666675]  common_interrupt+0x33/0x38
> [    4.666675] EIP: hardware_init_port+0x3f/0x190
> [    4.666675] EFLAGS: 00200246 CPU: 0
> [    4.666675] EAX: c718990f EBX: 00000000 ECX: 00000000 EDX: 000003f9
> [    4.666675] ESI: 000003f9 EDI: 000003f8 EBP: c0065d98 ESP: c0065d84
> [    4.666675]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [    4.666675]  serial_ir_probe+0xbb/0x300
> [    4.666675]  platform_drv_probe+0x48/0xb0
...

ie an interrupt came in immediately after the request_irq(), before
all the data was properly set up, which then causes the interrupt
handler to take a fault because it tries to access some field that
hasn't even been set up yet.

The code line is helpful, the faulting instruction is

      mov    0x39c(%rax),%eax   <--- fault
      call ..
      mov    someglobalvar,%edx

which together with the supplied config file makes me able to match it
up with the assembly generation around it:

        inb %dx, %al    # tmp254, value
        andb    $1, %al #, tmp255
        testb   %al, %al        # tmp255
        je      .L233   #,
  .L215:
        movl    serial_ir+8, %eax       # serial_ir.rcdev, serial_ir.rcdev
        xorl    %edx, %edx      # _66->timeout
        movl    924(%eax), %eax # _66->timeout, _66->timeout
        call    nsecs_to_jiffies        #
        movl    jiffies, %edx   # jiffies, jiffies.33_70
        addl    %eax, %edx      # _69, tmp259
        movl    $serial_ir+16, %eax     #,
        call    mod_timer       #
        movl    serial_ir+8, %eax       # serial_ir.rcdev,
        call    ir_raw_event_handle     #
        movl    $1, %eax        #, <retval>

so it's that "serial_ir.rcdev->timeout" access that faults. So this is
the faulting source code:

drivers/media/rc/serial_ir.c: 402

        mod_timer(&serial_ir.timeout_timer,
                  jiffies + nsecs_to_jiffies(serial_ir.rcdev->timeout));

        ir_raw_event_handle(serial_ir.rcdev);

        return IRQ_HANDLED;

and serial_ir.rcdev is NULL when ti tries to look up the timeout.

That mod_timer() is new as of commit 2940c7e49775 ("[media] serial_ir:
generate timeout") and *that* is the actual new bug.

Looking at the code, that serial_ir.rcdev thing is initialized fairly
late in serial_ir_init_module(), while the interrupt is allocated
early in serial_ir_probe(), which is done _early_ in
serial_ir_init_module():

serial_ir_init_module -> serial_ir_init -> platform_driver_register ->
serial_ir_probe -> devm_request_irq

Mauro, Sean, please fix.

Anyway, this is clearly a bug in the serial_ir code, but it is *also*
once again clearly now being *triggered* due to the irq handling
changes.

I'm pretty sure that the thing that triggered this is once more commit
a9b4f08770b4 ("x86/ioapic: Restore IO-APIC irq_chip retrigger
callback") which seems to retrigger stale irqs that simply should not
be retriggered.

They aren't actually active any more, if they ever were.

So that commit seems to act like a random CONFIG_DEBUG_SHIRQ. It's
good for testing, but not good for actual users.

I the local APIC retrigger just unconditionally resends that irq. But
it's the core interrupt code that decides to retrigger it incorrectly
for some reason.

Why is IRQS_PENDING set for that thing? Something must have almost
certainly set it, despite the irq not actually having ever been
pending. Thomas?

                  Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-24 19:15 ` [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c Linus Torvalds
@ 2017-02-25  9:07   ` Ingo Molnar
  2017-02-25 18:02     ` Linus Torvalds
  2017-02-25 11:14   ` Sean Young
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2017-02-25  9:07 UTC (permalink / raw)
  To: linux-arm-kernel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm pretty sure that the thing that triggered this is once more commit
> a9b4f08770b4 ("x86/ioapic: Restore IO-APIC irq_chip retrigger
> callback") which seems to retrigger stale irqs that simply should not
> be retriggered.
> 
> They aren't actually active any more, if they ever were.
> 
> So that commit seems to act like a random CONFIG_DEBUG_SHIRQ. It's
> good for testing, but not good for actual users.

Yeah, so some distros like Fedora already have CONFIG_DEBUG_SHIRQ=y enabled, but 
part of the problem is that CONFIG_DEBUG_SHIRQ=y has this:

#ifdef CONFIG_DEBUG_SHIRQ_FIXME
        if (!retval && (irqflags & IRQF_SHARED)) {
                /*
                 * It's a shared IRQ -- the driver ought to be prepared for it
                 * to happen immediately, so let's make sure....
                 * We disable the irq to make sure that a 'real' IRQ doesn't
                 * run in parallel with our fake.
                 */
                unsigned long flags;

                disable_irq(irq);
                local_irq_save(flags);

                handler(irq, dev_id);

                local_irq_restore(flags);
                enable_irq(irq);
        }
#endif

Note that the '_FIXME' postfix effectively turns off this particular debug check 
...

Thomas and me realized this risk a week ago ago, and tried to resurrect full 
CONFIG_DEBUG_SHIRQ=y functionality to more reliably trigger these problems:

  https://lkml.org/lkml/2017/2/15/341

... but were forced to revert that fix because it's not working on x86 yet (it's 
crashing). We also thought we fixed the problems exposed in drivers, as the 
retrigger changes have been in -tip and -next for some time, but were clearly too 
optimistic about that.

So, should we revert the hw-retrigger change:

  a9b4f08770b4 x86/ioapic: Restore IO-APIC irq_chip retrigger callback

... until we managed to fix CONFIG_DEBUG_SHIRQ=y? If you'd like to revert it 
upstream straight away:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-25  9:07   ` Ingo Molnar
@ 2017-02-25 18:02     ` Linus Torvalds
  2017-02-27 10:09       ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2017-02-25 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Feb 25, 2017 at 1:07 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> So, should we revert the hw-retrigger change:
>
>   a9b4f08770b4 x86/ioapic: Restore IO-APIC irq_chip retrigger callback
>
> ... until we managed to fix CONFIG_DEBUG_SHIRQ=y? If you'd like to revert it
> upstream straight away:
>
> Acked-by: Ingo Molnar <mingo@kernel.org>

So I'm in no huge hurry to revert that commit as long as we're still
in the merge window or early -rc's.

>From a debug standpoint, the spurious early interrupts are fine, and
hopefully will help us find more broken drivers.

It's just that I'd like to revert it before the actual 4.11 release,
unless we can find a better solution.

Because it really seems like the interrupt re-trigger is entirely
bogus. It's not an _actual_ "re-trigger the interrupt that may have
gotten lost", it's some code that ends up triggering it for no good
reason.

So I'd actually hope that we could figure out why IRQS_PENDING got
set, and perhaps fix the underlying cause?

There are several things that set IRQS_PENDING, ranging from "try to
test mis-routed interrupts while irqd was working", to "prepare for
suspend losing the irq for us", to "irq auto-probing uses it on
unassigned probable irqs".

The *actual* reason to re-send, namely getting a nested irq that we
had to drop because we got a second one while still handling the first
(or because it was disabled), is just one case.

Personally, I'd suspect some left-over state from auto-probing earlier
in the boot, but I don't know. Could we fix that underlying issue?

                 Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-25 18:02     ` Linus Torvalds
@ 2017-02-27 10:09       ` Thomas Gleixner
  2017-02-27 12:32         ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-27 10:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 25 Feb 2017, Linus Torvalds wrote:
> On Sat, Feb 25, 2017 at 1:07 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > So, should we revert the hw-retrigger change:
> >
> >   a9b4f08770b4 x86/ioapic: Restore IO-APIC irq_chip retrigger callback
> >
> > ... until we managed to fix CONFIG_DEBUG_SHIRQ=y? If you'd like to revert it
> > upstream straight away:
> >
> > Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> So I'm in no huge hurry to revert that commit as long as we're still
> in the merge window or early -rc's.
> 
> From a debug standpoint, the spurious early interrupts are fine, and
> hopefully will help us find more broken drivers.
> 
> It's just that I'd like to revert it before the actual 4.11 release,
> unless we can find a better solution.
> 
> Because it really seems like the interrupt re-trigger is entirely
> bogus. It's not an _actual_ "re-trigger the interrupt that may have
> gotten lost", it's some code that ends up triggering it for no good
> reason.
> 
> So I'd actually hope that we could figure out why IRQS_PENDING got
> set, and perhaps fix the underlying cause?
>
> There are several things that set IRQS_PENDING, ranging from "try to
> test mis-routed interrupts while irqd was working", to "prepare for
> suspend losing the irq for us", to "irq auto-probing uses it on
> unassigned probable irqs".
> 
> The *actual* reason to re-send, namely getting a nested irq that we
> had to drop because we got a second one while still handling the first
> (or because it was disabled), is just one case.
> 
> Personally, I'd suspect some left-over state from auto-probing earlier
> in the boot, but I don't know. Could we fix that underlying issue?

I'm on it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 10:09       ` Thomas Gleixner
@ 2017-02-27 12:32         ` Thomas Gleixner
  2017-02-27 15:41           ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-27 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 27 Feb 2017, Thomas Gleixner wrote:
> On Sat, 25 Feb 2017, Linus Torvalds wrote:
> > There are several things that set IRQS_PENDING, ranging from "try to
> > test mis-routed interrupts while irqd was working", to "prepare for
> > suspend losing the irq for us", to "irq auto-probing uses it on
> > unassigned probable irqs".
> > 
> > The *actual* reason to re-send, namely getting a nested irq that we
> > had to drop because we got a second one while still handling the first
> > (or because it was disabled), is just one case.
> > 
> > Personally, I'd suspect some left-over state from auto-probing earlier
> > in the boot, but I don't know. Could we fix that underlying issue?
> 
> I'm on it.

Adding a few trace points to the irq code gives me a pretty consistent
picture across a bunch of different machines.

The pending interrupt issue happens, at least on my test boxen, mostly on
the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
happen occasionally.

 - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
   one of the affected machines this stops when the interrupt system is
   switched to interrupt remapping !?!?!?

 - Spurious interrupts on various interrupt lines, which are triggered by
   IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
   that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
   triggered when the IOAPIC >=16 interrupt fires.

 - Spurious interrupt caused by driver probing itself. I.e. the driver
   probing code causes an interrupt issued from the device
   inadvertently. That happens even on IRQ >= 16.

   This problem might be handled by the device driver code itself, but
   that's going to be ugly. See below.

None of that was caused by misrouted irq or autoprobing. Autoprobing is not
used on any modern maschine at all and the misrouted mechanism cannot
set the pending flag on a interrupt which has no action installed.

Sure, we might not set the pending flag on spurious interrupts, when an
interrupt has no action assigned. We had it that way quite some years
ago. But that caused issues on a couple of (non x86) systems because the
peripheral which sent the spurious interrupt was waiting for
acknowledgement forever and therefor not sending new interrupts. There was
no way to handle that other than resending the interrupt after the action
was installed. The reason for this is the following race:

Device driver init()

  ack_bogus_pending_irq_in_device();
                          <---- Device sends spurious interrupt
  request_irq();

We cannot call ack_bogus_pending_irq_in_device() after the handler has been
installed because it might clear a real interrupt before the handler is
invoked. In fact we can, but that requires pretty ugly code. See below.

Yes, it's broken hardware, but in that area there is more broken than sane
hardware.

The other point here is, that the "IOAPIC triggers spurious interrupt on
legacy lines" issue can actually cause the same problem as the immediate
resend/retrigger:

  request_irq();
  			 <---- Other device sends IOAPIC interrupt which
			       triggers the legacy line.

If the handler is not prepared@that point, then it explodes as
well. Same for this odd IRQ0 -> IRQ7 trigger, which I can observe on two
machines.

The whole resend/retrigger business is only relevant for edge type
interrupts to deal with the following problem:

  disable_irq();
			<---- Device sends interrupt
  enable_irq();

  ===> Previously sent interrupt has not been acked in the device so no
       further interrupts happen. We lost an edge and are toast.

We never do the resend/retrigger for level type interrupts, because they
are resent by hardware if the interrupt is still pending, which is the case
when you don't acknowlegde it at the device level.

So now you might argue that a driver could handle this by doing:

  disable_irq();
			<---- Device sends interrupt
  enable_irq();

  lock_device();
   do_some_magic_checks_and_handle_pending_irq();
  unlock_device();

Requiring this would add tons of even more broken code to the device
drivers and I really doubt that we want to go there. Same for the
request_irq() case.

I rather prefer that drivers are able to deal with spurious interrupts even
on non shared interrupt lines and let the resend/retrigger mechanism expose
them early when they do not.

We can try to sample more data from the machines of affected users, but I
doubt that it will give us more information than confirming that we really
have to deal with all that hardware wreckage out there in some way or the
other.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 12:32         ` Thomas Gleixner
@ 2017-02-27 15:41           ` Ingo Molnar
  2017-02-27 16:07             ` Tony Lindgren
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ingo Molnar @ 2017-02-27 15:41 UTC (permalink / raw)
  To: linux-arm-kernel


* Thomas Gleixner <tglx@linutronix.de> wrote:

> The pending interrupt issue happens, at least on my test boxen, mostly on
> the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> happen occasionally.
>
> 
>  - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
>    one of the affected machines this stops when the interrupt system is
>    switched to interrupt remapping !?!?!?
> 
>  - Spurious interrupts on various interrupt lines, which are triggered by
>    IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
>    that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
>    triggered when the IOAPIC >=16 interrupt fires.
> 
>  - Spurious interrupt caused by driver probing itself. I.e. the driver
>    probing code causes an interrupt issued from the device
>    inadvertently. That happens even on IRQ >= 16.
> 
>    This problem might be handled by the device driver code itself, but
>    that's going to be ugly. See below.

That's pretty colorful behavior...

> We can try to sample more data from the machines of affected users, but I doubt 
> that it will give us more information than confirming that we really have to 
> deal with all that hardware wreckage out there in some way or the other.

BTW., instead of trying to avoid the scenario, wow about moving in the other 
direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core code 
starting from v4.12 or so, i.e. requiring device driver IRQ handlers to handle the 
invocation of IRQ handlers at pretty much any moment. (We could also extend it a 
bit, such as invoking IRQ handlers early after suspend/resume wakeup.)

Because it's not the requirement that hurts primarily, but the resulting 
non-determinism and the sporadic crashes. Which can be solved by making the race 
deterministic via the debug facility.

If the IRQ handler crashed the moment it was first written by the driver author 
we'd never see these problems.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 15:41           ` Ingo Molnar
@ 2017-02-27 16:07             ` Tony Lindgren
  2017-02-27 16:18               ` Thomas Gleixner
  2017-02-27 16:12             ` Thomas Gleixner
  2017-02-27 19:23             ` Linus Torvalds
  2 siblings, 1 reply; 17+ messages in thread
From: Tony Lindgren @ 2017-02-27 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

* Ingo Molnar <mingo@kernel.org> [170227 07:44]:
> 
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > The pending interrupt issue happens, at least on my test boxen, mostly on
> > the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> > happen occasionally.
> >
> > 
> >  - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
> >    one of the affected machines this stops when the interrupt system is
> >    switched to interrupt remapping !?!?!?
> > 
> >  - Spurious interrupts on various interrupt lines, which are triggered by
> >    IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
> >    that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
> >    triggered when the IOAPIC >=16 interrupt fires.
> > 
> >  - Spurious interrupt caused by driver probing itself. I.e. the driver
> >    probing code causes an interrupt issued from the device
> >    inadvertently. That happens even on IRQ >= 16.

This sounds a lot like what we saw few weeks ago with dwc3. See commit
12a7f17fac5b ("usb: dwc3: omap: fix race of pm runtime with irq handler in
probe"). It was caused by runtime PM and -EPROBE_DEFER, see the description
Grygorii wrote up in that commit.

> >    This problem might be handled by the device driver code itself, but
> >    that's going to be ugly. See below.
> 
> That's pretty colorful behavior...
> 
> > We can try to sample more data from the machines of affected users, but I doubt 
> > that it will give us more information than confirming that we really have to 
> > deal with all that hardware wreckage out there in some way or the other.
> 
> BTW., instead of trying to avoid the scenario, wow about moving in the other 
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core code 
> starting from v4.12 or so, i.e. requiring device driver IRQ handlers to handle the 
> invocation of IRQ handlers at pretty much any moment. (We could also extend it a 
> bit, such as invoking IRQ handlers early after suspend/resume wakeup.)
> 
> Because it's not the requirement that hurts primarily, but the resulting 
> non-determinism and the sporadic crashes. Which can be solved by making the race 
> deterministic via the debug facility.
> 
> If the IRQ handler crashed the moment it was first written by the driver author 
> we'd never see these problems.

Just in case this is PM related.. Maybe the spurious interrupt is pending
from earlier? This could be caused by glitches on the lines with runtime PM,
or a pending interrupt during suspend/resume. In that case IRQ_DISABLE_UNLAZY
might provide more clues if the problem goes away.

Regards,

Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 16:07             ` Tony Lindgren
@ 2017-02-27 16:18               ` Thomas Gleixner
  2017-02-27 16:26                 ` Tony Lindgren
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-27 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 27 Feb 2017, Tony Lindgren wrote:
> * Ingo Molnar <mingo@kernel.org> [170227 07:44]:
> > Because it's not the requirement that hurts primarily, but the resulting 
> > non-determinism and the sporadic crashes. Which can be solved by making the race 
> > deterministic via the debug facility.
> > 
> > If the IRQ handler crashed the moment it was first written by the driver author 
> > we'd never see these problems.
> 
> Just in case this is PM related.. Maybe the spurious interrupt is pending
> from earlier? This could be caused by glitches on the lines with runtime PM,
> or a pending interrupt during suspend/resume. In that case IRQ_DISABLE_UNLAZY
> might provide more clues if the problem goes away.

It's not PM related.  That's just silly hardware. At the moment when you
enable some magic bit in the control register, which is required to probe
the version, the fricking thing spits out a spurious interrupt despite the
interrupt enable bit in the same control register being still disabled. Of
course we cannot install an interrupt handler before having probed the
version and setup other stuff, except we add magic 'if (!initialized)'
crappola into the handler and lose the ability to install version dependent
handlers afterwards.

Wonderful crap that, isn't it?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 16:18               ` Thomas Gleixner
@ 2017-02-27 16:26                 ` Tony Lindgren
  0 siblings, 0 replies; 17+ messages in thread
From: Tony Lindgren @ 2017-02-27 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

* Thomas Gleixner <tglx@linutronix.de> [170227 08:20]:
> On Mon, 27 Feb 2017, Tony Lindgren wrote:
> > * Ingo Molnar <mingo@kernel.org> [170227 07:44]:
> > > Because it's not the requirement that hurts primarily, but the resulting 
> > > non-determinism and the sporadic crashes. Which can be solved by making the race 
> > > deterministic via the debug facility.
> > > 
> > > If the IRQ handler crashed the moment it was first written by the driver author 
> > > we'd never see these problems.
> > 
> > Just in case this is PM related.. Maybe the spurious interrupt is pending
> > from earlier? This could be caused by glitches on the lines with runtime PM,
> > or a pending interrupt during suspend/resume. In that case IRQ_DISABLE_UNLAZY
> > might provide more clues if the problem goes away.
> 
> It's not PM related.  That's just silly hardware. At the moment when you
> enable some magic bit in the control register, which is required to probe
> the version, the fricking thing spits out a spurious interrupt despite the
> interrupt enable bit in the same control register being still disabled. Of
> course we cannot install an interrupt handler before having probed the
> version and setup other stuff, except we add magic 'if (!initialized)'
> crappola into the handler and lose the ability to install version dependent
> handlers afterwards.

OK and presumably no -EPROBE_DEFER happening either.

> Wonderful crap that, isn't it?

Sounds broken..

Regards,

Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 15:41           ` Ingo Molnar
  2017-02-27 16:07             ` Tony Lindgren
@ 2017-02-27 16:12             ` Thomas Gleixner
  2017-02-27 19:23             ` Linus Torvalds
  2 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-27 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 27 Feb 2017, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > The pending interrupt issue happens, at least on my test boxen, mostly on
> > the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16
> > happen occasionally.
> >
> > 
> >  - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On
> >    one of the affected machines this stops when the interrupt system is
> >    switched to interrupt remapping !?!?!?
> > 
> >  - Spurious interrupts on various interrupt lines, which are triggered by
> >    IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets
> >    that the legacy PCI interrupt (which is used when IOAPIC is disabled) is
> >    triggered when the IOAPIC >=16 interrupt fires.
> > 
> >  - Spurious interrupt caused by driver probing itself. I.e. the driver
> >    probing code causes an interrupt issued from the device
> >    inadvertently. That happens even on IRQ >= 16.
> > 
> >    This problem might be handled by the device driver code itself, but
> >    that's going to be ugly. See below.
> 
> That's pretty colorful behavior...
> 
> > We can try to sample more data from the machines of affected users, but I doubt 
> > that it will give us more information than confirming that we really have to 
> > deal with all that hardware wreckage out there in some way or the other.
> 
> BTW., instead of trying to avoid the scenario, wow about moving in the other 
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core code 
> starting from v4.12 or so, i.e. requiring device driver IRQ handlers to handle the 
> invocation of IRQ handlers at pretty much any moment. (We could also extend it a 
> bit, such as invoking IRQ handlers early after suspend/resume wakeup.)
> 
> Because it's not the requirement that hurts primarily, but the resulting 
> non-determinism and the sporadic crashes. Which can be solved by making the race 
> deterministic via the debug facility.
> 
> If the IRQ handler crashed the moment it was first written by the driver author 
> we'd never see these problems.

Yes, I'd love to do that. That's just a nightmare as well.

See commit 6d83f94db95cf, which added the _FIXME suffix to that code.

So recently I tried to invoke the primary handler, which causes another
issue:

  Some of the low level code (e.g. IOAPIC interrupt migration, but also
  some PPC irq chip machinery) depends on being called in hard interrupt
  context. They invoke get_irq_regs(), which obviously does not work from
  thread context.

So I removed that one from -next as well and postponed it another time. And
I should have known before I tried it that it does not work. Simply because
of that stuff x86 cannot use the software based resend mechanism.

Still trying to wrap my head around a proper solution for the problem. On
x86 we might just check whether we are really in hard irq context and
otherwise skip the part which depends on get_irq_regs(). That would be a
sane thing to do. Have not yet looked at the PPC side of affairs, whether
that's easy to solve as well. But even if it is, then there might be still
other magic code in some irq chip drivers which relies on things which are
only available/correct when actually invoked by a hardware interrupt.

Not only the hardware has colorful behaviour ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 15:41           ` Ingo Molnar
  2017-02-27 16:07             ` Tony Lindgren
  2017-02-27 16:12             ` Thomas Gleixner
@ 2017-02-27 19:23             ` Linus Torvalds
  2017-02-28  7:25               ` Ingo Molnar
  2017-02-28 10:51               ` Thomas Gleixner
  2 siblings, 2 replies; 17+ messages in thread
From: Linus Torvalds @ 2017-02-27 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 27, 2017 at 7:41 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> BTW., instead of trying to avoid the scenario, wow about moving in the other
> direction: making CONFIG_DEBUG_SHIRQ=y unconditional property in the IRQ core code
> starting from v4.12 or so

The problem is that it's generally almost undebuggable ahead of time
by developers, and most users won't be able to do good reports either,
because the symptom is geberally a boot-time crash, often with no
logs.

So this option is *not* good for actual users. It's been tried before.

It's a wonderful thing for developers to run with to make sure the
drivers they are working on are resilient to this problem, but we have
too many legacy drivers and lots of random users, and it's unrealistic
to expect them to handle it.

In other words: what will happen is that distros start getting bootup
problem reports six months or a year after we've done it, and *if*
they figure out it's the irq enabling, they'll disable it, because
they have no way to solve it either.

And core developers will just maybe see the occasional "4.12 doesn't
boot for me" reports, but by then developers will ahve moved on to
4.16 or something.

So I don't disagree that in a perfect world all drivers should just
handle it. It's just that it's not realistic.

The fact that we have now *twice* gotten an oops report or a "this
machine doesn't boot" report etc within a week or so of merging the
problematic patch does *not* indicate that it's easy to fix or rare.

Quite the reverse.

It indicates that it's just rare enough that core developers don't see
it, but it's common enough to have triggered issues in random places.

And it will just get *much* worse when you then get the random
end-users that usually have older machines than the developers who
actually test daily development -git trees.

Then we'll just hit *other* random places, and without having testers
that are competent and willing or able to bisect or debug.

                  Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 19:23             ` Linus Torvalds
@ 2017-02-28  7:25               ` Ingo Molnar
  2017-02-28 10:51               ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2017-02-28  7:25 UTC (permalink / raw)
  To: linux-arm-kernel

* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> In other words: what will happen is that distros start getting bootup problem 
> reports six months or a year after we've done it, and *if* they figure out it's 
> the irq enabling, they'll disable it, because they have no way to solve it 
> either.
> 
> And core developers will just maybe see the occasional "4.12 doesn't boot for 
> me" reports, but by then developers will ahve moved on to 4.16 or something.

Yeah, you are right, there's over 2,100 request_irq() calls in the kernel and 
perhaps only 1% of them gets tested on real hardware by the time a change gets 
upstream :-/

So in theory we could require all *new* drivers handle this properly, as new 
drivers tend to come through developers who can fix such bugs - which would at 
least guarantee that with time the problem would obsolete itself.

But I cannot see an easy non-intrusive way to do that - we'd have to rename all 
existing request_irq() calls:

 - We could rename request_irq() to request_irq_legacy() - which does not do the 
   tests.

 - The 'new' request_irq() function would do the tests unconditionally.

... and that's just too much churn - unless you think it's worth it, or if anyone 
can think of a better method to phase in the new behavior without affecting old 
users.

Another, less intrusive method would be to introduce a new request_irq_shared() 
API call, mark request_irq() obsolete (without putting warnings into the build 
though), and put a check into checkpatch that warns about request_irq() use.

The flip side would be that:

 - request_irq() is such a nice and well-known name to waste

 - plus request_irq_shared() is a misnomer, as this has nothing to do with sharing 
   IRQs, it's about getting IRQs in unexpected moments.

I'd rather do the renaming that is easy to automate and the pain is one time only.

Or revert the retrigger change and muddle through, although as per Thomas's 
observations spurious interrupts are very common.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-27 19:23             ` Linus Torvalds
  2017-02-28  7:25               ` Ingo Molnar
@ 2017-02-28 10:51               ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-28 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 27 Feb 2017, Linus Torvalds wrote:
> So I don't disagree that in a perfect world all drivers should just
> handle it. It's just that it's not realistic.
> 
> The fact that we have now *twice* gotten an oops report or a "this
> machine doesn't boot" report etc within a week or so of merging the
> problematic patch does *not* indicate that it's easy to fix or rare.
> 
> Quite the reverse.
> 
> It indicates that it's just rare enough that core developers don't see
> it, but it's common enough to have triggered issues in random places.
> 
> And it will just get *much* worse when you then get the random
> end-users that usually have older machines than the developers who
> actually test daily development -git trees.

I tend to disagree.

The retrigger mechanism has been there forever, at least the history git
tree which goes back to 2.5.0 has it and as it was in the initial commit is
it was there in 2.4 already.

We broke that in 4.2 when the x86 interrupt mechanism was reworked
completely. That went pretty much unnoticed until somebody moved from 4.1
to 4.9 and discovered that edge interrupts got lost. It's not surprising
that it went unnoticed because lots of stuff moved towards MSI (which has
the retrigger still enabled) and the devices which are prone to the 'lost
edge' issue are limited.

So we had that retrigger exposing crappy older drivers to the spurious
interrupt until 2 years ago. The two drivers which have been exposed by
bringing the retrigger back are post 4.2 or have been wreckaged post 4.2.

Due to the fact that the old drivers have been exposed over many years to
the spurious retrigger (that "feature" exists on old hardware as well) I
don't think it's much of a problem.

We also had the DEBUG_SHIRQ active until we had to disable it in
6d83f94db95cf (2.6.38) but not because it exposed crappy interrupt
handlers. We had to disable due to a functional problem described in the
commit message. I know for sure that distros had enabled DEBUG_SHIRQ (and
some still have the reduced functionality of it enabled).

What I find more problematic is:

 - to keep this 'lost edge' regression around, which really can render
   hardware useless.

 - to ignore the fact that these buggy interrupt handlers can be exposed by
   spurious interrupt events (as I showed in the other mail) "naturaly",
   but with a way smaller probability. If a user runs into such a spurious
   problem he has absolutely NO chance to debug it at all. Exposing it by
   the retrigger mechanism makes the detection more "reliable" and allows
   debugging it. The backtraces are pretty telling and clear.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c
  2017-02-24 19:15 ` [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c Linus Torvalds
  2017-02-25  9:07   ` Ingo Molnar
@ 2017-02-25 11:14   ` Sean Young
  2017-02-25 11:28     ` [PATCH] [media] serial_ir: ensure we're ready to receive interrupts Sean Young
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Young @ 2017-02-25 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 24, 2017 at 11:15:51AM -0800, Linus Torvalds wrote:
> Added more relevant people. I've debugged the immediate problem below,
> but I think there's another problem that actually triggered this.
> 
> On Fri, Feb 24, 2017 at 10:28 AM, kernel test robot
> <fengguang.wu@intel.com> wrote:
> >
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >
> > commit ff58d005cd10fcd372787cceac547e11cf706ff6
> > Merge: 5ab3566 9eeb0ed
> >
> >     Merge tag 'media/v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> [...]
> > [    4.664940] rc rc0: lirc_dev: driver ir-lirc-codec (rc-loopback) registered at minor = 0
> > [    4.666322] BUG: unable to handle kernel NULL pointer dereference at 0000039c
> > [    4.666675] IP: serial_ir_irq_handler+0x189/0x410
> 
> This merge being fingered ends up being a subtle interaction with other changes.
> 
> Those "other changes" are (again) the interrupt retrigger code that
> was reverted for 4.10, and then we tried to merge them again this
> merge window.
> 
> Because the immediate cause is:
> 
> > [    4.666675] EIP: serial_ir_irq_handler+0x189/0x410
> > [    4.666675] Call Trace:
> > [    4.666675]  <IRQ>
> > [    4.666675]  __handle_irq_event_percpu+0x57/0x100
> > [    4.666675]  handle_irq_event_percpu+0x1d/0x50
> > [    4.666675]  handle_irq_event+0x32/0x60
> > [    4.666675]  handle_edge_irq+0xa5/0x120
> > [    4.666675]  handle_irq+0x9d/0xd0
> > [    4.666675]  </IRQ>
> > [    4.666675]  do_IRQ+0x5f/0x130
> > [    4.666675]  common_interrupt+0x33/0x38
> > [    4.666675] EIP: hardware_init_port+0x3f/0x190
> > [    4.666675] EFLAGS: 00200246 CPU: 0
> > [    4.666675] EAX: c718990f EBX: 00000000 ECX: 00000000 EDX: 000003f9
> > [    4.666675] ESI: 000003f9 EDI: 000003f8 EBP: c0065d98 ESP: c0065d84
> > [    4.666675]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > [    4.666675]  serial_ir_probe+0xbb/0x300
> > [    4.666675]  platform_drv_probe+0x48/0xb0
> ...
> 
> ie an interrupt came in immediately after the request_irq(), before
> all the data was properly set up, which then causes the interrupt
> handler to take a fault because it tries to access some field that
> hasn't even been set up yet.

Oh dear. I've pointed out others making the same mistake when doing code
reviews, clearly I need review my own code better.

> 
> The code line is helpful, the faulting instruction is
> 
>       mov    0x39c(%rax),%eax   <--- fault
>       call ..
>       mov    someglobalvar,%edx
> 
> which together with the supplied config file makes me able to match it
> up with the assembly generation around it:
> 
>         inb %dx, %al    # tmp254, value
>         andb    $1, %al #, tmp255
>         testb   %al, %al        # tmp255
>         je      .L233   #,
>   .L215:
>         movl    serial_ir+8, %eax       # serial_ir.rcdev, serial_ir.rcdev
>         xorl    %edx, %edx      # _66->timeout
>         movl    924(%eax), %eax # _66->timeout, _66->timeout
>         call    nsecs_to_jiffies        #
>         movl    jiffies, %edx   # jiffies, jiffies.33_70
>         addl    %eax, %edx      # _69, tmp259
>         movl    $serial_ir+16, %eax     #,
>         call    mod_timer       #
>         movl    serial_ir+8, %eax       # serial_ir.rcdev,
>         call    ir_raw_event_handle     #
>         movl    $1, %eax        #, <retval>
> 
> so it's that "serial_ir.rcdev->timeout" access that faults. So this is
> the faulting source code:
> 
> drivers/media/rc/serial_ir.c: 402
> 
>         mod_timer(&serial_ir.timeout_timer,
>                   jiffies + nsecs_to_jiffies(serial_ir.rcdev->timeout));
> 
>         ir_raw_event_handle(serial_ir.rcdev);
> 
>         return IRQ_HANDLED;
> 
> and serial_ir.rcdev is NULL when ti tries to look up the timeout.

ir_raw_event_handle() call will also go bang if passed a null pointer, so
this problem existed before (since v4.10).

Thanks for debugging this, I'll send a patch as a reply to this email.


Sean

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] [media] serial_ir: ensure we're ready to receive interrupts
  2017-02-25 11:14   ` Sean Young
@ 2017-02-25 11:28     ` Sean Young
  2017-02-25 13:34       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Young @ 2017-02-25 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

When the interrupt requested with devm_request_irq(), serial_ir.rcdev
is still null so will cause null deference if the irq handler is called
early on.

Also ensure that timeout_timer is setup.

Link: http://lkml.kernel.org/r/CA+55aFxsh2uF8gi5sN_guY3Z+tiLv7LpJYKBw+y8vqLzp+TsnQ at mail.gmail.com

Cc: <stable@vger.kernel.org> # 4.10
Signed-off-by: Sean Young <sean@mess.org>
---
 drivers/media/rc/serial_ir.c | 243 +++++++++++++++++++++----------------------
 1 file changed, 118 insertions(+), 125 deletions(-)

diff --git a/drivers/media/rc/serial_ir.c b/drivers/media/rc/serial_ir.c
index 923fb22..22144b4 100644
--- a/drivers/media/rc/serial_ir.c
+++ b/drivers/media/rc/serial_ir.c
@@ -487,74 +487,6 @@ static void serial_ir_timeout(unsigned long arg)
 	ir_raw_event_handle(serial_ir.rcdev);
 }
 
-static int serial_ir_probe(struct platform_device *dev)
-{
-	int i, nlow, nhigh, result;
-
-	result = devm_request_irq(&dev->dev, irq, serial_ir_irq_handler,
-				  share_irq ? IRQF_SHARED : 0,
-				  KBUILD_MODNAME, &hardware);
-	if (result < 0) {
-		if (result == -EBUSY)
-			dev_err(&dev->dev, "IRQ %d busy\n", irq);
-		else if (result == -EINVAL)
-			dev_err(&dev->dev, "Bad irq number or handler\n");
-		return result;
-	}
-
-	/* Reserve io region. */
-	if ((iommap &&
-	     (devm_request_mem_region(&dev->dev, iommap, 8 << ioshift,
-				      KBUILD_MODNAME) == NULL)) ||
-	     (!iommap && (devm_request_region(&dev->dev, io, 8,
-			  KBUILD_MODNAME) == NULL))) {
-		dev_err(&dev->dev, "port %04x already in use\n", io);
-		dev_warn(&dev->dev, "use 'setserial /dev/ttySX uart none'\n");
-		dev_warn(&dev->dev,
-			 "or compile the serial port driver as module and\n");
-		dev_warn(&dev->dev, "make sure this module is loaded first\n");
-		return -EBUSY;
-	}
-
-	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
-		    (unsigned long)&serial_ir);
-
-	result = hardware_init_port();
-	if (result < 0)
-		return result;
-
-	/* Initialize pulse/space widths */
-	init_timing_params(50, 38000);
-
-	/* If pin is high, then this must be an active low receiver. */
-	if (sense == -1) {
-		/* wait 1/2 sec for the power supply */
-		msleep(500);
-
-		/*
-		 * probe 9 times every 0.04s, collect "votes" for
-		 * active high/low
-		 */
-		nlow = 0;
-		nhigh = 0;
-		for (i = 0; i < 9; i++) {
-			if (sinp(UART_MSR) & hardware[type].signal_pin)
-				nlow++;
-			else
-				nhigh++;
-			msleep(40);
-		}
-		sense = nlow >= nhigh ? 1 : 0;
-		dev_info(&dev->dev, "auto-detected active %s receiver\n",
-			 sense ? "low" : "high");
-	} else
-		dev_info(&dev->dev, "Manually using active %s receiver\n",
-			 sense ? "low" : "high");
-
-	dev_dbg(&dev->dev, "Interrupt %d, port %04x obtained\n", irq, io);
-	return 0;
-}
-
 static int serial_ir_open(struct rc_dev *rcdev)
 {
 	unsigned long flags;
@@ -679,6 +611,123 @@ static int serial_ir_resume(struct platform_device *dev)
 	return 0;
 }
 
+static int serial_ir_probe(struct platform_device *dev)
+{
+	struct rc_dev *rcdev;
+	int i, nlow, nhigh, result;
+
+	rcdev = devm_rc_allocate_device(&dev->dev, RC_DRIVER_IR_RAW);
+	if (!rcdev)
+		return -ENOMEM;
+
+	if (hardware[type].send_pulse && hardware[type].send_space)
+		rcdev->tx_ir = serial_ir_tx;
+	if (hardware[type].set_send_carrier)
+		rcdev->s_tx_carrier = serial_ir_tx_carrier;
+	if (hardware[type].set_duty_cycle)
+		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
+
+	switch (type) {
+	case IR_HOMEBREW:
+		rcdev->input_name = "Serial IR type home-brew";
+		break;
+	case IR_IRDEO:
+		rcdev->input_name = "Serial IR type IRdeo";
+		break;
+	case IR_IRDEO_REMOTE:
+		rcdev->input_name = "Serial IR type IRdeo remote";
+		break;
+	case IR_ANIMAX:
+		rcdev->input_name = "Serial IR type AnimaX";
+		break;
+	case IR_IGOR:
+		rcdev->input_name = "Serial IR type IgorPlug";
+		break;
+	}
+
+	rcdev->input_phys = KBUILD_MODNAME "/input0";
+	rcdev->input_id.bustype = BUS_HOST;
+	rcdev->input_id.vendor = 0x0001;
+	rcdev->input_id.product = 0x0001;
+	rcdev->input_id.version = 0x0100;
+	rcdev->open = serial_ir_open;
+	rcdev->close = serial_ir_close;
+	rcdev->dev.parent = &serial_ir.pdev->dev;
+	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
+	rcdev->driver_name = KBUILD_MODNAME;
+	rcdev->map_name = RC_MAP_RC6_MCE;
+	rcdev->min_timeout = 1;
+	rcdev->timeout = IR_DEFAULT_TIMEOUT;
+	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
+	rcdev->rx_resolution = 250000;
+
+	serial_ir.rcdev = rcdev;
+
+	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
+		    (unsigned long)&serial_ir);
+
+	result = devm_request_irq(&dev->dev, irq, serial_ir_irq_handler,
+				  share_irq ? IRQF_SHARED : 0,
+				  KBUILD_MODNAME, &hardware);
+	if (result < 0) {
+		if (result == -EBUSY)
+			dev_err(&dev->dev, "IRQ %d busy\n", irq);
+		else if (result == -EINVAL)
+			dev_err(&dev->dev, "Bad irq number or handler\n");
+		return result;
+	}
+
+	/* Reserve io region. */
+	if ((iommap &&
+	     (devm_request_mem_region(&dev->dev, iommap, 8 << ioshift,
+				      KBUILD_MODNAME) == NULL)) ||
+	     (!iommap && (devm_request_region(&dev->dev, io, 8,
+			  KBUILD_MODNAME) == NULL))) {
+		dev_err(&dev->dev, "port %04x already in use\n", io);
+		dev_warn(&dev->dev, "use 'setserial /dev/ttySX uart none'\n");
+		dev_warn(&dev->dev,
+			 "or compile the serial port driver as module and\n");
+		dev_warn(&dev->dev, "make sure this module is loaded first\n");
+		return -EBUSY;
+	}
+
+	result = hardware_init_port();
+	if (result < 0)
+		return result;
+
+	/* Initialize pulse/space widths */
+	init_timing_params(50, 38000);
+
+	/* If pin is high, then this must be an active low receiver. */
+	if (sense == -1) {
+		/* wait 1/2 sec for the power supply */
+		msleep(500);
+
+		/*
+		 * probe 9 times every 0.04s, collect "votes" for
+		 * active high/low
+		 */
+		nlow = 0;
+		nhigh = 0;
+		for (i = 0; i < 9; i++) {
+			if (sinp(UART_MSR) & hardware[type].signal_pin)
+				nlow++;
+			else
+				nhigh++;
+			msleep(40);
+		}
+		sense = nlow >= nhigh ? 1 : 0;
+		dev_info(&dev->dev, "auto-detected active %s receiver\n",
+			 sense ? "low" : "high");
+	} else
+		dev_info(&dev->dev, "Manually using active %s receiver\n",
+			 sense ? "low" : "high");
+
+	dev_dbg(&dev->dev, "Interrupt %d, port %04x obtained\n", irq, io);
+
+	return devm_rc_register_device(&dev->dev, rcdev);
+}
+
 static struct platform_driver serial_ir_driver = {
 	.probe		= serial_ir_probe,
 	.suspend	= serial_ir_suspend,
@@ -723,7 +772,6 @@ static void serial_ir_exit(void)
 
 static int __init serial_ir_init_module(void)
 {
-	struct rc_dev *rcdev;
 	int result;
 
 	switch (type) {
@@ -754,63 +802,9 @@ static int __init serial_ir_init_module(void)
 		sense = !!sense;
 
 	result = serial_ir_init();
-	if (result)
-		return result;
-
-	rcdev = devm_rc_allocate_device(&serial_ir.pdev->dev, RC_DRIVER_IR_RAW);
-	if (!rcdev) {
-		result = -ENOMEM;
-		goto serial_cleanup;
-	}
-
-	if (hardware[type].send_pulse && hardware[type].send_space)
-		rcdev->tx_ir = serial_ir_tx;
-	if (hardware[type].set_send_carrier)
-		rcdev->s_tx_carrier = serial_ir_tx_carrier;
-	if (hardware[type].set_duty_cycle)
-		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
-
-	switch (type) {
-	case IR_HOMEBREW:
-		rcdev->input_name = "Serial IR type home-brew";
-		break;
-	case IR_IRDEO:
-		rcdev->input_name = "Serial IR type IRdeo";
-		break;
-	case IR_IRDEO_REMOTE:
-		rcdev->input_name = "Serial IR type IRdeo remote";
-		break;
-	case IR_ANIMAX:
-		rcdev->input_name = "Serial IR type AnimaX";
-		break;
-	case IR_IGOR:
-		rcdev->input_name = "Serial IR type IgorPlug";
-		break;
-	}
-
-	rcdev->input_phys = KBUILD_MODNAME "/input0";
-	rcdev->input_id.bustype = BUS_HOST;
-	rcdev->input_id.vendor = 0x0001;
-	rcdev->input_id.product = 0x0001;
-	rcdev->input_id.version = 0x0100;
-	rcdev->open = serial_ir_open;
-	rcdev->close = serial_ir_close;
-	rcdev->dev.parent = &serial_ir.pdev->dev;
-	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
-	rcdev->driver_name = KBUILD_MODNAME;
-	rcdev->map_name = RC_MAP_RC6_MCE;
-	rcdev->min_timeout = 1;
-	rcdev->timeout = IR_DEFAULT_TIMEOUT;
-	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
-	rcdev->rx_resolution = 250000;
-
-	serial_ir.rcdev = rcdev;
-
-	result = rc_register_device(rcdev);
-
 	if (!result)
 		return 0;
-serial_cleanup:
+
 	serial_ir_exit();
 	return result;
 }
@@ -818,7 +812,6 @@ static int __init serial_ir_init_module(void)
 static void __exit serial_ir_exit_module(void)
 {
 	del_timer_sync(&serial_ir.timeout_timer);
-	rc_unregister_device(serial_ir.rcdev);
 	serial_ir_exit();
 }
 
-- 
2.9.3

On Sat, Feb 25, 2017 at 11:14:25AM +0000, Sean Young wrote:
> On Fri, Feb 24, 2017 at 11:15:51AM -0800, Linus Torvalds wrote:
> > Added more relevant people. I've debugged the immediate problem below,
> > but I think there's another problem that actually triggered this.
> > 
> > On Fri, Feb 24, 2017 at 10:28 AM, kernel test robot
> > <fengguang.wu@intel.com> wrote:
> > >
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > >
> > > commit ff58d005cd10fcd372787cceac547e11cf706ff6
> > > Merge: 5ab3566 9eeb0ed
> > >
> > >     Merge tag 'media/v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > [...]
> > > [    4.664940] rc rc0: lirc_dev: driver ir-lirc-codec (rc-loopback) registered at minor = 0
> > > [    4.666322] BUG: unable to handle kernel NULL pointer dereference at 0000039c
> > > [    4.666675] IP: serial_ir_irq_handler+0x189/0x410
> > 
> > This merge being fingered ends up being a subtle interaction with other changes.
> > 
> > Those "other changes" are (again) the interrupt retrigger code that
> > was reverted for 4.10, and then we tried to merge them again this
> > merge window.
> > 
> > Because the immediate cause is:
> > 
> > > [    4.666675] EIP: serial_ir_irq_handler+0x189/0x410
> > > [    4.666675] Call Trace:
> > > [    4.666675]  <IRQ>
> > > [    4.666675]  __handle_irq_event_percpu+0x57/0x100
> > > [    4.666675]  handle_irq_event_percpu+0x1d/0x50
> > > [    4.666675]  handle_irq_event+0x32/0x60
> > > [    4.666675]  handle_edge_irq+0xa5/0x120
> > > [    4.666675]  handle_irq+0x9d/0xd0
> > > [    4.666675]  </IRQ>
> > > [    4.666675]  do_IRQ+0x5f/0x130
> > > [    4.666675]  common_interrupt+0x33/0x38
> > > [    4.666675] EIP: hardware_init_port+0x3f/0x190
> > > [    4.666675] EFLAGS: 00200246 CPU: 0
> > > [    4.666675] EAX: c718990f EBX: 00000000 ECX: 00000000 EDX: 000003f9
> > > [    4.666675] ESI: 000003f9 EDI: 000003f8 EBP: c0065d98 ESP: c0065d84
> > > [    4.666675]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > > [    4.666675]  serial_ir_probe+0xbb/0x300
> > > [    4.666675]  platform_drv_probe+0x48/0xb0
> > ...
> > 
> > ie an interrupt came in immediately after the request_irq(), before
> > all the data was properly set up, which then causes the interrupt
> > handler to take a fault because it tries to access some field that
> > hasn't even been set up yet.
> 
> Oh dear. I've pointed out others making the same mistake when doing code
> reviews, clearly I need review my own code better.
> 
> > 
> > The code line is helpful, the faulting instruction is
> > 
> >       mov    0x39c(%rax),%eax   <--- fault
> >       call ..
> >       mov    someglobalvar,%edx
> > 
> > which together with the supplied config file makes me able to match it
> > up with the assembly generation around it:
> > 
> >         inb %dx, %al    # tmp254, value
> >         andb    $1, %al #, tmp255
> >         testb   %al, %al        # tmp255
> >         je      .L233   #,
> >   .L215:
> >         movl    serial_ir+8, %eax       # serial_ir.rcdev, serial_ir.rcdev
> >         xorl    %edx, %edx      # _66->timeout
> >         movl    924(%eax), %eax # _66->timeout, _66->timeout
> >         call    nsecs_to_jiffies        #
> >         movl    jiffies, %edx   # jiffies, jiffies.33_70
> >         addl    %eax, %edx      # _69, tmp259
> >         movl    $serial_ir+16, %eax     #,
> >         call    mod_timer       #
> >         movl    serial_ir+8, %eax       # serial_ir.rcdev,
> >         call    ir_raw_event_handle     #
> >         movl    $1, %eax        #, <retval>
> > 
> > so it's that "serial_ir.rcdev->timeout" access that faults. So this is
> > the faulting source code:
> > 
> > drivers/media/rc/serial_ir.c: 402
> > 
> >         mod_timer(&serial_ir.timeout_timer,
> >                   jiffies + nsecs_to_jiffies(serial_ir.rcdev->timeout));
> > 
> >         ir_raw_event_handle(serial_ir.rcdev);
> > 
> >         return IRQ_HANDLED;
> > 
> > and serial_ir.rcdev is NULL when ti tries to look up the timeout.
> 
> ir_raw_event_handle() call will also go bang if passed a null pointer, so
> this problem existed before (since v4.10).
> 
> Thanks for debugging this, I'll send a patch as a reply to this email.
> 
> 
> Sean

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH] [media] serial_ir: ensure we're ready to receive interrupts
  2017-02-25 11:28     ` [PATCH] [media] serial_ir: ensure we're ready to receive interrupts Sean Young
@ 2017-02-25 13:34       ` Mauro Carvalho Chehab
  2017-02-25 13:54         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2017-02-25 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

Em Sat, 25 Feb 2017 11:28:16 +0000
Sean Young <sean@mess.org> escreveu:

> When the interrupt requested with devm_request_irq(), serial_ir.rcdev
> is still null so will cause null deference if the irq handler is called
> early on.
> 
> Also ensure that timeout_timer is setup.
> 
> Link: http://lkml.kernel.org/r/CA+55aFxsh2uF8gi5sN_guY3Z+tiLv7LpJYKBw+y8vqLzp+TsnQ at mail.gmail.com
> 
> Cc: <stable@vger.kernel.org> # 4.10
> Signed-off-by: Sean Young <sean@mess.org>
> ---
>  drivers/media/rc/serial_ir.c | 243 +++++++++++++++++++++----------------------
>  1 file changed, 118 insertions(+), 125 deletions(-)
> 
> diff --git a/drivers/media/rc/serial_ir.c b/drivers/media/rc/serial_ir.c
> index 923fb22..22144b4 100644
> --- a/drivers/media/rc/serial_ir.c
> +++ b/drivers/media/rc/serial_ir.c
> @@ -487,74 +487,6 @@ static void serial_ir_timeout(unsigned long arg)
>  	ir_raw_event_handle(serial_ir.rcdev);
>  }
>  
> -static int serial_ir_probe(struct platform_device *dev)
> -{
> -	int i, nlow, nhigh, result;

Hmm... why did you move this function to be after serial_ir_open()?

That messes with the diff without no good reason, making harder to
identify what you changed here.

> -
> -	result = devm_request_irq(&dev->dev, irq, serial_ir_irq_handler,
> -				  share_irq ? IRQF_SHARED : 0,
> -				  KBUILD_MODNAME, &hardware);
> -	if (result < 0) {
> -		if (result == -EBUSY)
> -			dev_err(&dev->dev, "IRQ %d busy\n", irq);
> -		else if (result == -EINVAL)
> -			dev_err(&dev->dev, "Bad irq number or handler\n");
> -		return result;
> -	}
> -
> -	/* Reserve io region. */
> -	if ((iommap &&
> -	     (devm_request_mem_region(&dev->dev, iommap, 8 << ioshift,
> -				      KBUILD_MODNAME) == NULL)) ||
> -	     (!iommap && (devm_request_region(&dev->dev, io, 8,
> -			  KBUILD_MODNAME) == NULL))) {
> -		dev_err(&dev->dev, "port %04x already in use\n", io);
> -		dev_warn(&dev->dev, "use 'setserial /dev/ttySX uart none'\n");
> -		dev_warn(&dev->dev,
> -			 "or compile the serial port driver as module and\n");
> -		dev_warn(&dev->dev, "make sure this module is loaded first\n");
> -		return -EBUSY;
> -	}
> -
> -	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
> -		    (unsigned long)&serial_ir);
> -
> -	result = hardware_init_port();
> -	if (result < 0)
> -		return result;
> -
> -	/* Initialize pulse/space widths */
> -	init_timing_params(50, 38000);
> -
> -	/* If pin is high, then this must be an active low receiver. */
> -	if (sense == -1) {
> -		/* wait 1/2 sec for the power supply */
> -		msleep(500);
> -
> -		/*
> -		 * probe 9 times every 0.04s, collect "votes" for
> -		 * active high/low
> -		 */
> -		nlow = 0;
> -		nhigh = 0;
> -		for (i = 0; i < 9; i++) {
> -			if (sinp(UART_MSR) & hardware[type].signal_pin)
> -				nlow++;
> -			else
> -				nhigh++;
> -			msleep(40);
> -		}
> -		sense = nlow >= nhigh ? 1 : 0;
> -		dev_info(&dev->dev, "auto-detected active %s receiver\n",
> -			 sense ? "low" : "high");
> -	} else
> -		dev_info(&dev->dev, "Manually using active %s receiver\n",
> -			 sense ? "low" : "high");
> -
> -	dev_dbg(&dev->dev, "Interrupt %d, port %04x obtained\n", irq, io);
> -	return 0;
> -}
> -
>  static int serial_ir_open(struct rc_dev *rcdev)
>  {
>  	unsigned long flags;
> @@ -679,6 +611,123 @@ static int serial_ir_resume(struct platform_device *dev)
>  	return 0;
>  }
>  
> +static int serial_ir_probe(struct platform_device *dev)
> +{
> +	struct rc_dev *rcdev;
> +	int i, nlow, nhigh, result;
> +
> +	rcdev = devm_rc_allocate_device(&dev->dev, RC_DRIVER_IR_RAW);
> +	if (!rcdev)
> +		return -ENOMEM;
> +
> +	if (hardware[type].send_pulse && hardware[type].send_space)
> +		rcdev->tx_ir = serial_ir_tx;
> +	if (hardware[type].set_send_carrier)
> +		rcdev->s_tx_carrier = serial_ir_tx_carrier;
> +	if (hardware[type].set_duty_cycle)
> +		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
> +
> +	switch (type) {
> +	case IR_HOMEBREW:
> +		rcdev->input_name = "Serial IR type home-brew";
> +		break;
> +	case IR_IRDEO:
> +		rcdev->input_name = "Serial IR type IRdeo";
> +		break;
> +	case IR_IRDEO_REMOTE:
> +		rcdev->input_name = "Serial IR type IRdeo remote";
> +		break;
> +	case IR_ANIMAX:
> +		rcdev->input_name = "Serial IR type AnimaX";
> +		break;
> +	case IR_IGOR:
> +		rcdev->input_name = "Serial IR type IgorPlug";
> +		break;
> +	}
> +
> +	rcdev->input_phys = KBUILD_MODNAME "/input0";
> +	rcdev->input_id.bustype = BUS_HOST;
> +	rcdev->input_id.vendor = 0x0001;
> +	rcdev->input_id.product = 0x0001;
> +	rcdev->input_id.version = 0x0100;
> +	rcdev->open = serial_ir_open;
> +	rcdev->close = serial_ir_close;
> +	rcdev->dev.parent = &serial_ir.pdev->dev;
> +	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
> +	rcdev->driver_name = KBUILD_MODNAME;
> +	rcdev->map_name = RC_MAP_RC6_MCE;
> +	rcdev->min_timeout = 1;
> +	rcdev->timeout = IR_DEFAULT_TIMEOUT;
> +	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
> +	rcdev->rx_resolution = 250000;
> +
> +	serial_ir.rcdev = rcdev;
> +
> +	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
> +		    (unsigned long)&serial_ir);
> +
> +	result = devm_request_irq(&dev->dev, irq, serial_ir_irq_handler,
> +				  share_irq ? IRQF_SHARED : 0,
> +				  KBUILD_MODNAME, &hardware);
> +	if (result < 0) {
> +		if (result == -EBUSY)
> +			dev_err(&dev->dev, "IRQ %d busy\n", irq);
> +		else if (result == -EINVAL)
> +			dev_err(&dev->dev, "Bad irq number or handler\n");
> +		return result;
> +	}
> +
> +	/* Reserve io region. */
> +	if ((iommap &&
> +	     (devm_request_mem_region(&dev->dev, iommap, 8 << ioshift,
> +				      KBUILD_MODNAME) == NULL)) ||
> +	     (!iommap && (devm_request_region(&dev->dev, io, 8,
> +			  KBUILD_MODNAME) == NULL))) {
> +		dev_err(&dev->dev, "port %04x already in use\n", io);
> +		dev_warn(&dev->dev, "use 'setserial /dev/ttySX uart none'\n");
> +		dev_warn(&dev->dev,
> +			 "or compile the serial port driver as module and\n");
> +		dev_warn(&dev->dev, "make sure this module is loaded first\n");
> +		return -EBUSY;
> +	}
> +
> +	result = hardware_init_port();
> +	if (result < 0)
> +		return result;
> +
> +	/* Initialize pulse/space widths */
> +	init_timing_params(50, 38000);
> +
> +	/* If pin is high, then this must be an active low receiver. */
> +	if (sense == -1) {
> +		/* wait 1/2 sec for the power supply */
> +		msleep(500);
> +
> +		/*
> +		 * probe 9 times every 0.04s, collect "votes" for
> +		 * active high/low
> +		 */
> +		nlow = 0;
> +		nhigh = 0;
> +		for (i = 0; i < 9; i++) {
> +			if (sinp(UART_MSR) & hardware[type].signal_pin)
> +				nlow++;
> +			else
> +				nhigh++;
> +			msleep(40);
> +		}
> +		sense = nlow >= nhigh ? 1 : 0;
> +		dev_info(&dev->dev, "auto-detected active %s receiver\n",
> +			 sense ? "low" : "high");
> +	} else
> +		dev_info(&dev->dev, "Manually using active %s receiver\n",
> +			 sense ? "low" : "high");
> +
> +	dev_dbg(&dev->dev, "Interrupt %d, port %04x obtained\n", irq, io);
> +
> +	return devm_rc_register_device(&dev->dev, rcdev);
> +}
> +
>  static struct platform_driver serial_ir_driver = {
>  	.probe		= serial_ir_probe,
>  	.suspend	= serial_ir_suspend,
> @@ -723,7 +772,6 @@ static void serial_ir_exit(void)
>  
>  static int __init serial_ir_init_module(void)
>  {
> -	struct rc_dev *rcdev;
>  	int result;
>  
>  	switch (type) {
> @@ -754,63 +802,9 @@ static int __init serial_ir_init_module(void)
>  		sense = !!sense;
>  
>  	result = serial_ir_init();
> -	if (result)
> -		return result;
> -
> -	rcdev = devm_rc_allocate_device(&serial_ir.pdev->dev, RC_DRIVER_IR_RAW);
> -	if (!rcdev) {
> -		result = -ENOMEM;
> -		goto serial_cleanup;
> -	}
> -
> -	if (hardware[type].send_pulse && hardware[type].send_space)
> -		rcdev->tx_ir = serial_ir_tx;
> -	if (hardware[type].set_send_carrier)
> -		rcdev->s_tx_carrier = serial_ir_tx_carrier;
> -	if (hardware[type].set_duty_cycle)
> -		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
> -
> -	switch (type) {
> -	case IR_HOMEBREW:
> -		rcdev->input_name = "Serial IR type home-brew";
> -		break;
> -	case IR_IRDEO:
> -		rcdev->input_name = "Serial IR type IRdeo";
> -		break;
> -	case IR_IRDEO_REMOTE:
> -		rcdev->input_name = "Serial IR type IRdeo remote";
> -		break;
> -	case IR_ANIMAX:
> -		rcdev->input_name = "Serial IR type AnimaX";
> -		break;
> -	case IR_IGOR:
> -		rcdev->input_name = "Serial IR type IgorPlug";
> -		break;
> -	}
> -
> -	rcdev->input_phys = KBUILD_MODNAME "/input0";
> -	rcdev->input_id.bustype = BUS_HOST;
> -	rcdev->input_id.vendor = 0x0001;
> -	rcdev->input_id.product = 0x0001;
> -	rcdev->input_id.version = 0x0100;
> -	rcdev->open = serial_ir_open;
> -	rcdev->close = serial_ir_close;
> -	rcdev->dev.parent = &serial_ir.pdev->dev;
> -	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
> -	rcdev->driver_name = KBUILD_MODNAME;
> -	rcdev->map_name = RC_MAP_RC6_MCE;
> -	rcdev->min_timeout = 1;
> -	rcdev->timeout = IR_DEFAULT_TIMEOUT;
> -	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
> -	rcdev->rx_resolution = 250000;
> -
> -	serial_ir.rcdev = rcdev;
> -
> -	result = rc_register_device(rcdev);
> -
>  	if (!result)
>  		return 0;
> -serial_cleanup:
> +
>  	serial_ir_exit();
>  	return result;
>  }
> @@ -818,7 +812,6 @@ static int __init serial_ir_init_module(void)
>  static void __exit serial_ir_exit_module(void)
>  {
>  	del_timer_sync(&serial_ir.timeout_timer);
> -	rc_unregister_device(serial_ir.rcdev);
>  	serial_ir_exit();
>  }
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] [media] serial_ir: ensure we're ready to receive interrupts
  2017-02-25 13:34       ` Mauro Carvalho Chehab
@ 2017-02-25 13:54         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2017-02-25 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Em Sat, 25 Feb 2017 10:34:37 -0300
Mauro Carvalho Chehab <mchehab@infradead.org> escreveu:

> Em Sat, 25 Feb 2017 11:28:16 +0000
> Sean Young <sean@mess.org> escreveu:
> 
> > When the interrupt requested with devm_request_irq(), serial_ir.rcdev
> > is still null so will cause null deference if the irq handler is called
> > early on.
> > 
> > Also ensure that timeout_timer is setup.
> > 
> > Link: http://lkml.kernel.org/r/CA+55aFxsh2uF8gi5sN_guY3Z+tiLv7LpJYKBw+y8vqLzp+TsnQ at mail.gmail.com
> > 
> > Cc: <stable@vger.kernel.org> # 4.10
> > Signed-off-by: Sean Young <sean@mess.org>
> > ---
> >  drivers/media/rc/serial_ir.c | 243 +++++++++++++++++++++----------------------
> >  1 file changed, 118 insertions(+), 125 deletions(-)
> > 
> > diff --git a/drivers/media/rc/serial_ir.c b/drivers/media/rc/serial_ir.c
> > index 923fb22..22144b4 100644
> > --- a/drivers/media/rc/serial_ir.c
> > +++ b/drivers/media/rc/serial_ir.c
> > @@ -487,74 +487,6 @@ static void serial_ir_timeout(unsigned long arg)
> >  	ir_raw_event_handle(serial_ir.rcdev);
> >  }
> >  
> > -static int serial_ir_probe(struct platform_device *dev)
> > -{
> > -	int i, nlow, nhigh, result;
> 
> Hmm... why did you move this function to be after serial_ir_open()?
> 
> That messes with the diff without no good reason, making harder to
> identify what you changed here.

Ok, just reordered it. The diff now looks sane. Patch looks ok on
my eyes.

[PATCH v2] serial_ir: ensure we're ready to receive interrupts
From: Sean Young <sean@mess.org>

When the interrupt requested with devm_request_irq(), serial_ir.rcdev
is still null so will cause null deference if the irq handler is called
early on.

Also ensure that timeout_timer is setup.

Link: http://lkml.kernel.org/r/CA+55aFxsh2uF8gi5sN_guY3Z+tiLv7LpJYKBw+y8vqLzp+TsnQ at mail.gmail.com

[mchehab at s-opensource.com: moved serial_ir_probe() back to its original place]
Cc: <stable@vger.kernel.org> # 4.10
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>

diff --git a/drivers/media/rc/serial_ir.c b/drivers/media/rc/serial_ir.c
index 923fb2299553..4687c34c655a 100644
--- a/drivers/media/rc/serial_ir.c
+++ b/drivers/media/rc/serial_ir.c
@@ -489,8 +489,59 @@ static void serial_ir_timeout(unsigned long arg)
 
 static int serial_ir_probe(struct platform_device *dev)
 {
+	struct rc_dev *rcdev;
 	int i, nlow, nhigh, result;
 
+	rcdev = devm_rc_allocate_device(&dev->dev, RC_DRIVER_IR_RAW);
+	if (!rcdev)
+		return -ENOMEM;
+
+	if (hardware[type].send_pulse && hardware[type].send_space)
+		rcdev->tx_ir = serial_ir_tx;
+	if (hardware[type].set_send_carrier)
+		rcdev->s_tx_carrier = serial_ir_tx_carrier;
+	if (hardware[type].set_duty_cycle)
+		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
+
+	switch (type) {
+	case IR_HOMEBREW:
+		rcdev->input_name = "Serial IR type home-brew";
+		break;
+	case IR_IRDEO:
+		rcdev->input_name = "Serial IR type IRdeo";
+		break;
+	case IR_IRDEO_REMOTE:
+		rcdev->input_name = "Serial IR type IRdeo remote";
+		break;
+	case IR_ANIMAX:
+		rcdev->input_name = "Serial IR type AnimaX";
+		break;
+	case IR_IGOR:
+		rcdev->input_name = "Serial IR type IgorPlug";
+		break;
+	}
+
+	rcdev->input_phys = KBUILD_MODNAME "/input0";
+	rcdev->input_id.bustype = BUS_HOST;
+	rcdev->input_id.vendor = 0x0001;
+	rcdev->input_id.product = 0x0001;
+	rcdev->input_id.version = 0x0100;
+	rcdev->open = serial_ir_open;
+	rcdev->close = serial_ir_close;
+	rcdev->dev.parent = &serial_ir.pdev->dev;
+	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
+	rcdev->driver_name = KBUILD_MODNAME;
+	rcdev->map_name = RC_MAP_RC6_MCE;
+	rcdev->min_timeout = 1;
+	rcdev->timeout = IR_DEFAULT_TIMEOUT;
+	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
+	rcdev->rx_resolution = 250000;
+
+	serial_ir.rcdev = rcdev;
+
+	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
+		    (unsigned long)&serial_ir);
+
 	result = devm_request_irq(&dev->dev, irq, serial_ir_irq_handler,
 				  share_irq ? IRQF_SHARED : 0,
 				  KBUILD_MODNAME, &hardware);
@@ -516,9 +567,6 @@ static int serial_ir_probe(struct platform_device *dev)
 		return -EBUSY;
 	}
 
-	setup_timer(&serial_ir.timeout_timer, serial_ir_timeout,
-		    (unsigned long)&serial_ir);
-
 	result = hardware_init_port();
 	if (result < 0)
 		return result;
@@ -552,7 +600,8 @@ static int serial_ir_probe(struct platform_device *dev)
 			 sense ? "low" : "high");
 
 	dev_dbg(&dev->dev, "Interrupt %d, port %04x obtained\n", irq, io);
-	return 0;
+
+	return devm_rc_register_device(&dev->dev, rcdev);
 }
 
 static int serial_ir_open(struct rc_dev *rcdev)
@@ -723,7 +772,6 @@ static void serial_ir_exit(void)
 
 static int __init serial_ir_init_module(void)
 {
-	struct rc_dev *rcdev;
 	int result;
 
 	switch (type) {
@@ -754,63 +802,9 @@ static int __init serial_ir_init_module(void)
 		sense = !!sense;
 
 	result = serial_ir_init();
-	if (result)
-		return result;
-
-	rcdev = devm_rc_allocate_device(&serial_ir.pdev->dev, RC_DRIVER_IR_RAW);
-	if (!rcdev) {
-		result = -ENOMEM;
-		goto serial_cleanup;
-	}
-
-	if (hardware[type].send_pulse && hardware[type].send_space)
-		rcdev->tx_ir = serial_ir_tx;
-	if (hardware[type].set_send_carrier)
-		rcdev->s_tx_carrier = serial_ir_tx_carrier;
-	if (hardware[type].set_duty_cycle)
-		rcdev->s_tx_duty_cycle = serial_ir_tx_duty_cycle;
-
-	switch (type) {
-	case IR_HOMEBREW:
-		rcdev->input_name = "Serial IR type home-brew";
-		break;
-	case IR_IRDEO:
-		rcdev->input_name = "Serial IR type IRdeo";
-		break;
-	case IR_IRDEO_REMOTE:
-		rcdev->input_name = "Serial IR type IRdeo remote";
-		break;
-	case IR_ANIMAX:
-		rcdev->input_name = "Serial IR type AnimaX";
-		break;
-	case IR_IGOR:
-		rcdev->input_name = "Serial IR type IgorPlug";
-		break;
-	}
-
-	rcdev->input_phys = KBUILD_MODNAME "/input0";
-	rcdev->input_id.bustype = BUS_HOST;
-	rcdev->input_id.vendor = 0x0001;
-	rcdev->input_id.product = 0x0001;
-	rcdev->input_id.version = 0x0100;
-	rcdev->open = serial_ir_open;
-	rcdev->close = serial_ir_close;
-	rcdev->dev.parent = &serial_ir.pdev->dev;
-	rcdev->allowed_protocols = RC_BIT_ALL_IR_DECODER;
-	rcdev->driver_name = KBUILD_MODNAME;
-	rcdev->map_name = RC_MAP_RC6_MCE;
-	rcdev->min_timeout = 1;
-	rcdev->timeout = IR_DEFAULT_TIMEOUT;
-	rcdev->max_timeout = 10 * IR_DEFAULT_TIMEOUT;
-	rcdev->rx_resolution = 250000;
-
-	serial_ir.rcdev = rcdev;
-
-	result = rc_register_device(rcdev);
-
 	if (!result)
 		return 0;
-serial_cleanup:
+
 	serial_ir_exit();
 	return result;
 }
@@ -818,7 +812,6 @@ static int __init serial_ir_init_module(void)
 static void __exit serial_ir_exit_module(void)
 {
 	del_timer_sync(&serial_ir.timeout_timer);
-	rc_unregister_device(serial_ir.rcdev);
 	serial_ir_exit();
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-02-28 10:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <58b07b30.9XFLj9Hhl7F6HMc2%fengguang.wu@intel.com>
2017-02-24 19:15 ` [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c Linus Torvalds
2017-02-25  9:07   ` Ingo Molnar
2017-02-25 18:02     ` Linus Torvalds
2017-02-27 10:09       ` Thomas Gleixner
2017-02-27 12:32         ` Thomas Gleixner
2017-02-27 15:41           ` Ingo Molnar
2017-02-27 16:07             ` Tony Lindgren
2017-02-27 16:18               ` Thomas Gleixner
2017-02-27 16:26                 ` Tony Lindgren
2017-02-27 16:12             ` Thomas Gleixner
2017-02-27 19:23             ` Linus Torvalds
2017-02-28  7:25               ` Ingo Molnar
2017-02-28 10:51               ` Thomas Gleixner
2017-02-25 11:14   ` Sean Young
2017-02-25 11:28     ` [PATCH] [media] serial_ir: ensure we're ready to receive interrupts Sean Young
2017-02-25 13:34       ` Mauro Carvalho Chehab
2017-02-25 13:54         ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).