linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
@ 2013-09-18  8:54 Krzysztof Hałasa
  2013-09-18  9:41 ` Russell King - ARM Linux
  0 siblings, 1 reply; 5+ messages in thread
From: Krzysztof Hałasa @ 2013-09-18  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

ipx435, v3.11, the card hw has gone wild (another issue) and generates
data abort on register access in ISR. The issue here is the warning in
do_page_fault(), is it normal?

just got this:

# rmmod solo6x10
BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
in_atomic(): 0, irqs_disabled(): 128, pid: 303, name: rmmod
3 locks held by rmmod/303:
 #0:  (&__lockdep_no_validate__){......}, at: [<c0134ce4>] driver_detach+0x44/0xb8
 #1:  (&__lockdep_no_validate__){......}, at: [<c0134cf0>] driver_detach+0x50/0xb8
 #2:  (&mm->mmap_sem){++++++}, at: [<c000f210>] do_page_fault+0x68/0x294
irq event stamp: 5962
hardirqs last  enabled at (5961): [<c0273b4c>] _raw_spin_unlock_irqrestore+0x30/0x5c
hardirqs last disabled at (5962): [<c005afb8>] __free_irq+0x150/0x1d0
softirqs last  enabled at (0): [<c00149d4>] copy_process.part.64+0x27c/0xd4c
softirqs last disabled at (0): [<  (null)>]   (null)
CPU: 0 PID: 303 Comm: rmmod Tainted: G         C   3.11.0+ #51
[<c000db48>] (unwind_backtrace+0x0/0xf0) from [<c000b59c>] (show_stack+0x10/0x14)
[<c000b59c>] (show_stack+0x10/0x14) from [<c000f37c>] (do_page_fault+0x1d4/0x294)
[<c000f37c>] (do_page_fault+0x1d4/0x294) from [<c0008364>] (do_DataAbort+0x34/0x9c)
[<c0008364>] (do_DataAbort+0x34/0x9c) from [<c000bf9c>] (__dabt_svc+0x3c/0x60)
Exception stack(0xc6ad3e50 to 0xc6ad3e98)
3e40:                                     0000001c c78fa000 0000174a c8ca0000
3e60: c6b2a3c0 c78fa000 c78fa000 20000013 0000001c c6ad2000 a0000013 00000000
3e80: 0000000e c6ad3e98 c005afc8 bf04be98 20000093 ffffffff
[<c000bf9c>] (__dabt_svc+0x3c/0x60) from [<bf04be98>] (solo_isr+0x10/0x200 [solo6x10])
[<bf04be98>] (solo_isr+0x10/0x200 [solo6x10]) from [<c005afc8>] (__free_irq+0x160/0x1d0)
[<c005afc8>] (__free_irq+0x160/0x1d0) from [<c005b078>] (free_irq+0x40/0x84)
[<c005b078>] (free_irq+0x40/0x84) from [<bf04b0e0>] (free_solo_dev+0xdc/0xe4 [solo6x10])
[<bf04b0e0>] (free_solo_dev+0xdc/0xe4 [solo6x10]) from [<c0110d70>] (pci_device_remove+0x38/0xac)
[<c0110d70>] (pci_device_remove+0x38/0xac) from [<c0134544>] (__device_release_driver+0x70/0xd0)
[<c0134544>] (__device_release_driver+0x70/0xd0) from [<c0134d54>] (driver_detach+0xb4/0xb8)
[<c0134d54>] (driver_detach+0xb4/0xb8) from [<c0134370>] (bus_remove_driver+0x7c/0xc4)
[<c0134370>] (bus_remove_driver+0x7c/0xc4) from [<c0110ebc>] (pci_unregister_driver+0x14/0x74)
[<c0110ebc>] (pci_unregister_driver+0x14/0x74) from [<c005705c>] (SyS_delete_module+0x170/0x248)
[<c005705c>] (SyS_delete_module+0x170/0x248) from [<c0009040>] (ret_fast_syscall+0x0/0x44)

--
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 5+ messages in thread

* BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
  2013-09-18  8:54 BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301 Krzysztof Hałasa
@ 2013-09-18  9:41 ` Russell King - ARM Linux
  2013-09-18 13:01   ` Krzysztof Hałasa
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2013-09-18  9:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 18, 2013 at 10:54:19AM +0200, Krzysztof Ha?asa wrote:
> ipx435, v3.11, the card hw has gone wild (another issue) and generates
> data abort on register access in ISR. The issue here is the warning in
> do_page_fault(), is it normal?

It is normal if the data abort gets caused from a non-atomic context.
The real question is what is solo_isr() doing causing a data abort
in the first place.

I suspect it's this great bit of coding in free_solo_dev():

                pci_iounmap(pdev, solo_dev->reg_base);
                if (pdev->irq)
                        free_irq(pdev->irq, solo_dev);

So, what happens if you receive an IRQ (possibly shared by other PCI
devices) but you've unmapped the registers?

        status = solo_reg_read(solo_dev, SOLO_IRQ_STAT);
        if (!status)
                return IRQ_NONE;

where solo_reg_read() does this:
        ret = readl(solo_dev->reg_base + reg);

Yep, we try to read from memory we've just unmapped.

It's a driver bug.  Please report this to the driver authors.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
  2013-09-18  9:41 ` Russell King - ARM Linux
@ 2013-09-18 13:01   ` Krzysztof Hałasa
  2013-09-18 15:07     ` Russell King - ARM Linux
  0 siblings, 1 reply; 5+ messages in thread
From: Krzysztof Hałasa @ 2013-09-18 13:01 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

>> ipx435, v3.11, the card hw has gone wild (another issue) and generates
>> data abort on register access in ISR. The issue here is the warning in
>> do_page_fault(), is it normal?
>
> It is normal if the data abort gets caused from a non-atomic context.

Hmm, the ISR should be an atomic context...

#define in_atomic()     ((preempt_count() & ~PREEMPT_ACTIVE) != 0)

I think a hardware IRQ is supposed to add_preempt_count() via
irq_enter() -> __irq_enter()? How could in_atomic() return 0?
A delayed, imprecise abort?

> The real question is what is solo_isr() doing causing a data abort
> in the first place.

Right.

> I suspect it's this great bit of coding in free_solo_dev():

		/* Now cleanup the PCI device */
		solo_irq_off(solo_dev, ~0);
		pci_iounmap(pdev, solo_dev->reg_base);
		if (pdev->irq)
			free_irq(pdev->irq, solo_dev);

solo_irq_off() tries to disable IRQ by masking the hw line but I guess
there may be some time window in which a problem may show up.
Anyway, there is no reason to unmap registers before free_irq().

> Yep, we try to read from memory we've just unmapped.

> It's a driver bug.  Please report this to the driver authors.

Right (that's a "staging" driver and I better fix it myself).
Thanks.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 5+ messages in thread

* BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
  2013-09-18 13:01   ` Krzysztof Hałasa
@ 2013-09-18 15:07     ` Russell King - ARM Linux
  2013-09-18 21:00       ` Krzysztof Halasa
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2013-09-18 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 18, 2013 at 03:01:31PM +0200, Krzysztof Ha?asa wrote:
> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> > I suspect it's this great bit of coding in free_solo_dev():
> 
> 		/* Now cleanup the PCI device */
> 		solo_irq_off(solo_dev, ~0);
> 		pci_iounmap(pdev, solo_dev->reg_base);
> 		if (pdev->irq)
> 			free_irq(pdev->irq, solo_dev);
> 
> solo_irq_off() tries to disable IRQ by masking the hw line but I guess
> there may be some time window in which a problem may show up.
> Anyway, there is no reason to unmap registers before free_irq().

Except that it's part of debugging shared interrupts.

Remember, any driver which is still hooked into the IRQ handling subsystem
with a shared interrupt can still receive an interrupt at any moment due
to another device on that shared interrupt line raising its request.

So, in order to find buggy drivers, the IRQ layer will call a shared
interrupt handler immediately upon registration, and also upon freeing.

So, it's not a spurious IRQ, it's an attempt by the IRQ layer to find
buggy drivers, and it's done exactly that.  It's working as designed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301
  2013-09-18 15:07     ` Russell King - ARM Linux
@ 2013-09-18 21:00       ` Krzysztof Halasa
  0 siblings, 0 replies; 5+ messages in thread
From: Krzysztof Halasa @ 2013-09-18 21:00 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> So, in order to find buggy drivers, the IRQ layer will call a shared
> interrupt handler immediately upon registration, and also upon freeing.

Nice, didn't know about that. I will have to check why the bug didn't
manifest itself earlier and/or later (I did rmmod many times and it
fired just once, just after I cut video signal from the camera -
solo6x10 is a audio/video grabber and compressor).
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-18 21:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-18  8:54 BUG: sleeping function called from invalid context at arch/arm/mm/fault.c:301 Krzysztof Hałasa
2013-09-18  9:41 ` Russell King - ARM Linux
2013-09-18 13:01   ` Krzysztof Hałasa
2013-09-18 15:07     ` Russell King - ARM Linux
2013-09-18 21:00       ` Krzysztof Halasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).