* rtc: hang on boot during unregistration
@ 2014-04-05 3:26 Sasha Levin
2014-04-07 19:33 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2014-04-05 3:26 UTC (permalink / raw)
To: Alessandro Zummo, Lars-Peter Clausen; +Cc: Tejun Heo, Greg KH, rtc-linux, LKML
Hi all,
I got the following spew while booting. My VM proceeded to hang and die.
[ 16.620530] =============================================
[ 16.621059] [ INFO: possible recursive locking detected ]
[ 16.622024] 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376 Not tainted
[ 16.624145] ---------------------------------------------
[ 16.625261] kworker/0:1/1069 is trying to acquire lock:
[ 16.626300] ((&(&kobj->release)->work)){+.+.+.}, at: flush_work (kernel/workqueue.c:2880)
[ 16.628258]
[ 16.628258] but task is already holding lock:
[ 16.629415] ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
[ 16.630106]
[ 16.630106] other info that might help us debug this:
[ 16.630106] Possible unsafe locking scenario:
[ 16.630106]
[ 16.630106] CPU0
[ 16.630106] ----
[ 16.630106] lock((&(&kobj->release)->work));
[ 16.630106] lock((&(&kobj->release)->work));
[ 16.630106]
[ 16.630106] *** DEADLOCK ***
[ 16.630106]
[ 16.630106] May be due to missing lock nesting notation
[ 16.630106]
[ 16.630106] 2 locks held by kworker/0:1/1069:
[ 16.630106] #0: ("events"){.+.+.+}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
[ 16.630106] #1: ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
[ 16.630106]
[ 16.630106] stack backtrace:
[ 16.630106] CPU: 0 PID: 1069 Comm: kworker/0:1 Not tainted 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376
[ 16.630106] Workqueue: events kobject_delayed_cleanup
[ 16.630106] ffffffffb59d09d0 ffff88007c6a58d8 ffffffffb24bfb2f 0000000000000000
[ 16.630106] ffffffffb59d09d0 ffff88007c6a59d8 ffffffffaf1c4c8a ffff88007c5a3d98
[ 16.630106] ffff88007c5a3000 ffff880000000001 ffffffffaf1c0e18 ffff88007c5a3d60
[ 16.630106] Call Trace:
[ 16.630106] dump_stack (lib/dump_stack.c:52)
[ 16.630106] __lock_acquire (kernel/locking/lockdep.c:1740 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
[ 16.630106] ? check_irq_usage (kernel/locking/lockdep.c:1638)
[ 16.630106] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 16.630106] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 16.630106] ? flush_work (kernel/workqueue.c:2880)
[ 16.630106] flush_work (kernel/workqueue.c:2884)
[ 16.630106] ? flush_work (kernel/workqueue.c:2880)
[ 16.630106] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[ 16.630106] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[ 16.630106] ? try_to_grab_pending (kernel/workqueue.c:1260 kernel/workqueue.c:1259)
[ 16.630106] ? __cancel_work_timer (arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/workqueue.c:2913 (discriminator 2))
[ 16.630106] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 16.630106] __cancel_work_timer (arch/x86/include/asm/bitops.h:313 kernel/workqueue.c:610 kernel/workqueue.c:644 kernel/workqueue.c:2916)
[ 16.630106] cancel_work_sync (kernel/workqueue.c:2941)
[ 16.630106] work_fixup_free (kernel/workqueue.c:479)
[ 16.630106] __debug_check_no_obj_freed (lib/debugobjects.c:277 lib/debugobjects.c:700)
[ 16.630106] debug_check_no_obj_freed (lib/debugobjects.c:726)
[ 16.630106] kfree (mm/slub.c:2679 mm/slub.c:3401)
[ 16.630106] ? rtc_device_release (drivers/rtc/class.c:35)
[ 16.630106] rtc_device_release (drivers/rtc/class.c:35)
[ 16.630106] device_release (drivers/base/core.c:246)
[ 16.630106] kobject_delayed_cleanup (lib/kobject.c:629 lib/kobject.c:638)
[ 16.630106] process_one_work (kernel/workqueue.c:2221 include/linux/jump_label.h:105 include/trace/events/workqueue.h:111 kernel/workqueue.c:2226)
[ 16.630106] ? process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
[ 16.630106] worker_thread (kernel/workqueue.c:2348)
[ 16.630106] ? rescuer_thread (kernel/workqueue.c:2297)
[ 16.630106] kthread (kernel/kthread.c:219)
[ 16.630106] ? kthread_create_on_node (kernel/kthread.c:185)
[ 16.630106] ret_from_fork (arch/x86/kernel/entry_64.S:555)
[ 16.630106] ? kthread_create_on_node (kernel/kthread.c:185)
Thanks,
Sasha
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: rtc: hang on boot during unregistration
2014-04-05 3:26 rtc: hang on boot during unregistration Sasha Levin
@ 2014-04-07 19:33 ` Andrew Morton
2014-04-08 12:55 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2014-04-07 19:33 UTC (permalink / raw)
To: Sasha Levin
Cc: Alessandro Zummo, Lars-Peter Clausen, Tejun Heo, Greg KH,
rtc-linux, LKML
On Fri, 04 Apr 2014 23:26:42 -0400 Sasha Levin <sasha.levin@oracle.com> wrote:
> Hi all,
>
> I got the following spew while booting. My VM proceeded to hang and die.
>
> [ 16.620530] =============================================
> [ 16.621059] [ INFO: possible recursive locking detected ]
> [ 16.622024] 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376 Not tainted
> [ 16.624145] ---------------------------------------------
> [ 16.625261] kworker/0:1/1069 is trying to acquire lock:
> [ 16.626300] ((&(&kobj->release)->work)){+.+.+.}, at: flush_work (kernel/workqueue.c:2880)
> [ 16.628258]
> [ 16.628258] but task is already holding lock:
> [ 16.629415] ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
This is making my head spin a bit. Why is lockdep babbling about
include/linux/workqueue.h:186 and similar sites which appear to have
nothing to do with locking? Later it points a finger at
include/linux/jump_label.h:105, wtf.
> [ 16.630106] other info that might help us debug this:
> [ 16.630106] Possible unsafe locking scenario:
> [ 16.630106]
> [ 16.630106] CPU0
> [ 16.630106] ----
> [ 16.630106] lock((&(&kobj->release)->work));
> [ 16.630106] lock((&(&kobj->release)->work));
> [ 16.630106]
> [ 16.630106] *** DEADLOCK ***
> [ 16.630106]
> [ 16.630106] May be due to missing lock nesting notation
> [ 16.630106]
> [ 16.630106] 2 locks held by kworker/0:1/1069:
> [ 16.630106] #0: ("events"){.+.+.+}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [ 16.630106] #1: ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [ 16.630106]
> [ 16.630106] stack backtrace:
> [ 16.630106] CPU: 0 PID: 1069 Comm: kworker/0:1 Not tainted 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376
> [ 16.630106] Workqueue: events kobject_delayed_cleanup
> [ 16.630106] ffffffffb59d09d0 ffff88007c6a58d8 ffffffffb24bfb2f 0000000000000000
> [ 16.630106] ffffffffb59d09d0 ffff88007c6a59d8 ffffffffaf1c4c8a ffff88007c5a3d98
> [ 16.630106] ffff88007c5a3000 ffff880000000001 ffffffffaf1c0e18 ffff88007c5a3d60
> [ 16.630106] Call Trace:
> [ 16.630106] dump_stack (lib/dump_stack.c:52)
> [ 16.630106] __lock_acquire (kernel/locking/lockdep.c:1740 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
> [ 16.630106] ? check_irq_usage (kernel/locking/lockdep.c:1638)
> [ 16.630106] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> [ 16.630106] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 16.630106] ? flush_work (kernel/workqueue.c:2880)
> [ 16.630106] flush_work (kernel/workqueue.c:2884)
> [ 16.630106] ? flush_work (kernel/workqueue.c:2880)
> [ 16.630106] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
> [ 16.630106] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 16.630106] ? try_to_grab_pending (kernel/workqueue.c:1260 kernel/workqueue.c:1259)
> [ 16.630106] ? __cancel_work_timer (arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/workqueue.c:2913 (discriminator 2))
> [ 16.630106] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 16.630106] __cancel_work_timer (arch/x86/include/asm/bitops.h:313 kernel/workqueue.c:610 kernel/workqueue.c:644 kernel/workqueue.c:2916)
> [ 16.630106] cancel_work_sync (kernel/workqueue.c:2941)
> [ 16.630106] work_fixup_free (kernel/workqueue.c:479)
> [ 16.630106] __debug_check_no_obj_freed (lib/debugobjects.c:277 lib/debugobjects.c:700)
> [ 16.630106] debug_check_no_obj_freed (lib/debugobjects.c:726)
> [ 16.630106] kfree (mm/slub.c:2679 mm/slub.c:3401)
> [ 16.630106] ? rtc_device_release (drivers/rtc/class.c:35)
> [ 16.630106] rtc_device_release (drivers/rtc/class.c:35)
> [ 16.630106] device_release (drivers/base/core.c:246)
> [ 16.630106] kobject_delayed_cleanup (lib/kobject.c:629 lib/kobject.c:638)
> [ 16.630106] process_one_work (kernel/workqueue.c:2221 include/linux/jump_label.h:105 include/trace/events/workqueue.h:111 kernel/workqueue.c:2226)
> [ 16.630106] ? process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [ 16.630106] worker_thread (kernel/workqueue.c:2348)
> [ 16.630106] ? rescuer_thread (kernel/workqueue.c:2297)
> [ 16.630106] kthread (kernel/kthread.c:219)
> [ 16.630106] ? kthread_create_on_node (kernel/kthread.c:185)
> [ 16.630106] ret_from_fork (arch/x86/kernel/entry_64.S:555)
> [ 16.630106] ? kthread_create_on_node (kernel/kthread.c:185)
I think rtc's role in this is to trigger case ODEBUG_STATE_ACTIVE in
__debug_check_no_obj_freed(), which might be due to an RTC bug. This
then triggers problems in the workqueue core - calling flush_work() and
cancel_work_sync() within work_fixup_free() seems, umm, optimistic.
Maybe the semi-recursive code is OK and we're just missing lockdep
annotations, and the hang you saw was for other reasons.
Tejunstuff ;)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: rtc: hang on boot during unregistration
2014-04-07 19:33 ` Andrew Morton
@ 2014-04-08 12:55 ` Tejun Heo
0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2014-04-08 12:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Sasha Levin, Alessandro Zummo, Lars-Peter Clausen, Greg KH,
rtc-linux, LKML
Hello,
On Mon, Apr 07, 2014 at 12:33:28PM -0700, Andrew Morton wrote:
> > [ 16.625261] kworker/0:1/1069 is trying to acquire lock:
> > [ 16.626300] ((&(&kobj->release)->work)){+.+.+.}, at: flush_work (kernel/workqueue.c:2880)
> > [ 16.628258]
> > [ 16.628258] but task is already holding lock:
> > [ 16.629415] ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
>
> This is making my head spin a bit. Why is lockdep babbling about
> include/linux/workqueue.h:186 and similar sites which appear to have
> nothing to do with locking? Later it points a finger at
> include/linux/jump_label.h:105, wtf.
flush_work() and flush_workqueue() make use of lockdep anntations so
that dependency problems involving them can be detected through
lockdep.
> > [ 16.630106] other info that might help us debug this:
> > [ 16.630106] Possible unsafe locking scenario:
> > [ 16.630106]
> > [ 16.630106] CPU0
> > [ 16.630106] ----
> > [ 16.630106] lock((&(&kobj->release)->work));
> > [ 16.630106] lock((&(&kobj->release)->work));
Looks like a work item is trying to flush itself.
> > [ 16.630106] Call Trace:
> > [ 16.630106] dump_stack (lib/dump_stack.c:52)
> > [ 16.630106] __lock_acquire (kernel/locking/lockdep.c:1740 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
> > [ 16.630106] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> > [ 16.630106] flush_work (kernel/workqueue.c:2884)
> > [ 16.630106] __cancel_work_timer (arch/x86/include/asm/bitops.h:313 kernel/workqueue.c:610 kernel/workqueue.c:644 kernel/workqueue.c:2916)
> > [ 16.630106] cancel_work_sync (kernel/workqueue.c:2941)
> > [ 16.630106] work_fixup_free (kernel/workqueue.c:479)
Hah, this is debugobj fixup code trying to perform cancel_work_sync()
to make the object unused because it's being freed while active;
however, because free() is being called from the work item itself,
it's falling into deadlock and triggering the lockdep anntation.
> > [ 16.630106] __debug_check_no_obj_freed (lib/debugobjects.c:277 lib/debugobjects.c:700)
> > [ 16.630106] debug_check_no_obj_freed (lib/debugobjects.c:726)
> > [ 16.630106] kfree (mm/slub.c:2679 mm/slub.c:3401)
> > [ 16.630106] rtc_device_release (drivers/rtc/class.c:35)
IOW, rtc_device_release() is freeing the memory area this work item is
embedded in while the work item is queued on a workqueue.
> > [ 16.630106] device_release (drivers/base/core.c:246)
> > [ 16.630106] kobject_delayed_cleanup (lib/kobject.c:629 lib/kobject.c:638)
> > [ 16.630106] process_one_work (kernel/workqueue.c:2221 include/linux/jump_label.h:105 include/trace/events/workqueue.h:111 kernel/workqueue.c:2226)
> > [ 16.630106] worker_thread (kernel/workqueue.c:2348)
> > [ 16.630106] kthread (kernel/kthread.c:219)
> > [ 16.630106] ret_from_fork (arch/x86/kernel/entry_64.S:555)
>
> I think rtc's role in this is to trigger case ODEBUG_STATE_ACTIVE in
> __debug_check_no_obj_freed(), which might be due to an RTC bug. This
> then triggers problems in the workqueue core - calling flush_work() and
> cancel_work_sync() within work_fixup_free() seems, umm, optimistic.
It's debugobj fixup code which is opportunistically trying to take the
work item off workqueue to avoid use-after-free. As free() is being
called from work item itself, it falls into A-A deadlock.
> Maybe the semi-recursive code is OK and we're just missing lockdep
> annotations, and the hang you saw was for other reasons.
I've been always a bit suspicious about the fixup parts of debugobj.
I suppose they aren't too harmful as these types of chained failures
always follow proper error messages from debugobj itself. Anyways,
the lockdep annotation is only because debugobj fixup isn't too smart.
I'm not quite sure we wanna try harder tho.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-04-08 12:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-05 3:26 rtc: hang on boot during unregistration Sasha Levin
2014-04-07 19:33 ` Andrew Morton
2014-04-08 12:55 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).