Re: rtc: hang on boot during unregistration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>,
	Lars-Peter Clausen <lars@metafoo.de>, Tejun Heo <tj@kernel.org>,
	Greg KH <greg@kroah.com>,
	rtc-linux@googlegroups.com, LKML <linux-kernel@vger.kernel.org>
Subject: Re: rtc: hang on boot during unregistration
Date: Mon, 7 Apr 2014 12:33:28 -0700	[thread overview]
Message-ID: <20140407123328.1481dc51543235ce53e65254@linux-foundation.org> (raw)
In-Reply-To: <533F77F2.1050107@oracle.com>

On Fri, 04 Apr 2014 23:26:42 -0400 Sasha Levin <sasha.levin@oracle.com> wrote:

> Hi all,
> 
> I got the following spew while booting. My VM proceeded to hang and die.
> 
> [   16.620530] =============================================
> [   16.621059] [ INFO: possible recursive locking detected ]
> [   16.622024] 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376 Not tainted
> [   16.624145] ---------------------------------------------
> [   16.625261] kworker/0:1/1069 is trying to acquire lock:
> [   16.626300] ((&(&kobj->release)->work)){+.+.+.}, at: flush_work (kernel/workqueue.c:2880)
> [   16.628258]
> [   16.628258] but task is already holding lock:
> [   16.629415] ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)

This is making my head spin a bit.  Why is lockdep babbling about
include/linux/workqueue.h:186 and similar sites which appear to have
nothing to do with locking?  Later it points a finger at
include/linux/jump_label.h:105, wtf.

> [   16.630106] other info that might help us debug this:
> [   16.630106]  Possible unsafe locking scenario:
> [   16.630106]
> [   16.630106]        CPU0
> [   16.630106]        ----
> [   16.630106]   lock((&(&kobj->release)->work));
> [   16.630106]   lock((&(&kobj->release)->work));
> [   16.630106]
> [   16.630106]  *** DEADLOCK ***
> [   16.630106]
> [   16.630106]  May be due to missing lock nesting notation
> [   16.630106]
> [   16.630106] 2 locks held by kworker/0:1/1069:
> [   16.630106] #0: ("events"){.+.+.+}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [   16.630106] #1: ((&(&kobj->release)->work)){+.+.+.}, at: process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [   16.630106]
> [   16.630106] stack backtrace:
> [   16.630106] CPU: 0 PID: 1069 Comm: kworker/0:1 Not tainted 3.14.0-next-20140403-sasha-00019-g7474aa9-dirty #376
> [   16.630106] Workqueue: events kobject_delayed_cleanup
> [   16.630106]  ffffffffb59d09d0 ffff88007c6a58d8 ffffffffb24bfb2f 0000000000000000
> [   16.630106]  ffffffffb59d09d0 ffff88007c6a59d8 ffffffffaf1c4c8a ffff88007c5a3d98
> [   16.630106]  ffff88007c5a3000 ffff880000000001 ffffffffaf1c0e18 ffff88007c5a3d60
> [   16.630106] Call Trace:
> [   16.630106] dump_stack (lib/dump_stack.c:52)
> [   16.630106] __lock_acquire (kernel/locking/lockdep.c:1740 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
> [   16.630106] ? check_irq_usage (kernel/locking/lockdep.c:1638)
> [   16.630106] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> [   16.630106] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [   16.630106] ? flush_work (kernel/workqueue.c:2880)
> [   16.630106] flush_work (kernel/workqueue.c:2884)
> [   16.630106] ? flush_work (kernel/workqueue.c:2880)
> [   16.630106] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
> [   16.630106] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [   16.630106] ? try_to_grab_pending (kernel/workqueue.c:1260 kernel/workqueue.c:1259)
> [   16.630106] ? __cancel_work_timer (arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/workqueue.c:2913 (discriminator 2))
> [   16.630106] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [   16.630106] __cancel_work_timer (arch/x86/include/asm/bitops.h:313 kernel/workqueue.c:610 kernel/workqueue.c:644 kernel/workqueue.c:2916)
> [   16.630106] cancel_work_sync (kernel/workqueue.c:2941)
> [   16.630106] work_fixup_free (kernel/workqueue.c:479)
> [   16.630106] __debug_check_no_obj_freed (lib/debugobjects.c:277 lib/debugobjects.c:700)
> [   16.630106] debug_check_no_obj_freed (lib/debugobjects.c:726)
> [   16.630106] kfree (mm/slub.c:2679 mm/slub.c:3401)
> [   16.630106] ? rtc_device_release (drivers/rtc/class.c:35)
> [   16.630106] rtc_device_release (drivers/rtc/class.c:35)
> [   16.630106] device_release (drivers/base/core.c:246)
> [   16.630106] kobject_delayed_cleanup (lib/kobject.c:629 lib/kobject.c:638)
> [   16.630106] process_one_work (kernel/workqueue.c:2221 include/linux/jump_label.h:105 include/trace/events/workqueue.h:111 kernel/workqueue.c:2226)
> [   16.630106] ? process_one_work (include/linux/workqueue.h:186 kernel/workqueue.c:611 kernel/workqueue.c:638 kernel/workqueue.c:2214)
> [   16.630106] worker_thread (kernel/workqueue.c:2348)
> [   16.630106] ? rescuer_thread (kernel/workqueue.c:2297)
> [   16.630106] kthread (kernel/kthread.c:219)
> [   16.630106] ? kthread_create_on_node (kernel/kthread.c:185)
> [   16.630106] ret_from_fork (arch/x86/kernel/entry_64.S:555)
> [   16.630106] ? kthread_create_on_node (kernel/kthread.c:185)

I think rtc's role in this is to trigger case ODEBUG_STATE_ACTIVE in
__debug_check_no_obj_freed(), which might be due to an RTC bug.  This
then triggers problems in the workqueue core - calling flush_work() and
cancel_work_sync() within work_fixup_free() seems, umm, optimistic.

Maybe the semi-recursive code is OK and we're just missing lockdep
annotations, and the hang you saw was for other reasons.

Tejunstuff ;)

next prev parent reply	other threads:[~2014-04-07 19:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-05  3:26 rtc: hang on boot during unregistration Sasha Levin
2014-04-07 19:33 ` Andrew Morton [this message]
2014-04-08 12:55   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140407123328.1481dc51543235ce53e65254@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=a.zummo@towertech.it \
    --cc=greg@kroah.com \
    --cc=lars@metafoo.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rtc-linux@googlegroups.com \
    --cc=sasha.levin@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.