From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932187Ab3LIDgw (ORCPT ); Sun, 8 Dec 2013 22:36:52 -0500 Received: from mail-pd0-f173.google.com ([209.85.192.173]:50392 "EHLO mail-pd0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932150Ab3LIDgv (ORCPT ); Sun, 8 Dec 2013 22:36:51 -0500 Message-ID: <52A53ACF.6010409@linaro.org> Date: Sun, 08 Dec 2013 19:36:47 -0800 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: Sasha Levin , Ingo Molnar , Peter Zijlstra , tglx@linutronix.de CC: LKML Subject: Re: sched/timekeeping: lockdep spew References: <52A50490.4080705@oracle.com> In-Reply-To: <52A50490.4080705@oracle.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/08/2013 03:45 PM, Sasha Levin wrote: > Hi all, > > I've stumbled on this spew inside a KVM tools guest running latest > -next kernel: > > > [ 251.100221] ====================================================== > [ 251.100221] [ INFO: possible circular locking dependency detected ] > [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty > #4053 Not tainted > [ 251.101967] ------------------------------------------------------- > [ 251.101967] kworker/10:1/4506 is trying to acquire lock: > [ 251.101967] (timekeeper_seq){----..}, at: [] > retrigger_next_event+0x56/0x70 > [ 251.101967] > [ 251.101967] but task is already holding lock: > [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: > [] retrigger_next_event+0x3c/0x70 > [ 251.101967] > [ 251.101967] which lock already depends on the new lock. > [ 251.101967] > [ 251.101967] > [ 251.101967] the existing dependency chain (in reverse order) is: > [ 251.101967] > -> #5 (hrtimer_bases.lock#11){-.-...}: > [ 251.101967] [] validate_chain+0x6c3/0x7b0 > [ 251.101967] [] __lock_acquire+0x4ad/0x580 > [ 251.101967] [] lock_acquire+0x182/0x1d0 > [ 251.101967] [] _raw_spin_lock+0x40/0x80 > [ 251.101967] [] > __hrtimer_start_range_ns+0x208/0x4c0 > [ 251.101967] [] > start_bandwidth_timer+0x50/0x60 > [ 251.101967] [] > __enqueue_rt_entity+0x23e/0x290 > [ 251.101967] [] enqueue_rt_entity+0x6b/0x80 > [ 251.101967] [] enqueue_task_rt+0x36/0xb0 > [ 251.101967] [] enqueue_task+0x52/0x60 > [ 251.101967] [] activate_task+0x22/0x30 > [ 251.101967] [] ttwu_activate+0x21/0x50 > [ 251.101967] [] T.2138+0x3c/0x60 > [ 251.101967] [] ttwu_queue+0xb6/0xe0 > [ 251.101967] [] try_to_wake_up+0x264/0x2a0 > [ 251.101967] [] wake_up_process+0x3f/0x50 > [ 251.101967] [] watchdog_timer_fn+0x4c/0x1b0 > [ 251.101967] [] __run_hrtimer+0x1f6/0x310 > [ 251.101967] [] hrtimer_run_queues+0x196/0x1d0 > [ 251.101967] [] run_local_timers+0xe/0x20 > [ 251.101967] [] update_process_times+0x3d/0x80 > [ 251.101967] [] tick_periodic+0xab/0xc0 > [ 251.101967] [] tick_handle_periodic+0x24/0x80 > [ 251.101967] [] > local_apic_timer_interrupt+0x5d/0x70 > [ 251.101967] [] > smp_apic_timer_interrupt+0x45/0x60 > [ 251.101967] [] apic_timer_interrupt+0x72/0x80 > [ 251.101967] [] default_idle+0x11d/0x260 > [ 251.101967] [] arch_cpu_idle+0x18/0x50 > [ 251.101967] [] cpu_idle_loop+0x351/0x460 > [ 251.101967] [] cpu_startup_entry+0x70/0x80 > [ 251.101967] [] start_secondary+0xf3/0x100 > [ 251.101967] > -> #4 (&rt_b->rt_runtime_lock){-.-...}: > [ 251.101967] [] validate_chain+0x6c3/0x7b0 > [ 251.101967] [] __lock_acquire+0x4ad/0x580 > [ 251.101967] [] lock_acquire+0x182/0x1d0 > [ 251.101967] [] _raw_spin_lock+0x40/0x80 > [ 251.101967] [] > __enqueue_rt_entity+0x229/0x290 > [ 251.101967] [] enqueue_rt_entity+0x6b/0x80 > [ 251.101967] [] enqueue_task_rt+0x36/0xb0 > [ 251.101967] [] enqueue_task+0x52/0x60 > [ 251.101967] [] > __sched_setscheduler+0x33b/0x3f0 > [ 251.101967] [] > sched_setscheduler_nocheck+0x10/0x20 > [ 251.101967] [] > rcu_cpu_kthread_setup+0x2b/0x30 > [ 251.101967] [] smpboot_thread_fn+0x1ed/0x2c0 > [ 251.101967] [] kthread+0x105/0x110 > [ 251.101967] [] ret_from_fork+0x7c/0xb0 > [ 251.101967] > -> #3 (&rq->lock){-.-.-.}: > [ 251.101967] [] validate_chain+0x6c3/0x7b0 > [ 251.101967] [] __lock_acquire+0x4ad/0x580 > [ 251.101967] [] lock_acquire+0x182/0x1d0 > [ 251.101967] [] _raw_spin_lock+0x40/0x80 > [ 251.101967] [] wake_up_new_task+0x149/0x270 > [ 251.101967] [] do_fork+0x1ba/0x270 > [ 251.101967] [] kernel_thread+0x26/0x30 > [ 251.101967] [] rest_init+0x26/0x150 > [ 251.101967] [] start_kernel+0x3b9/0x3c0 > [ 251.101967] [] > x86_64_start_reservations+0x2a/0x2c > [ 251.101967] [] > x86_64_start_kernel+0x186/0x195 > [ 251.101967] > -> #2 (&p->pi_lock){-.-.-.}: > [ 251.101967] [] validate_chain+0x6c3/0x7b0 > [ 251.101967] [] __lock_acquire+0x4ad/0x580 > [ 251.101967] [] lock_acquire+0x182/0x1d0 > [ 251.101967] [] > _raw_spin_lock_irqsave+0x91/0xd0 > [ 251.101967] [] try_to_wake_up+0x39/0x2a0 > [ 251.101967] [] wake_up_process+0x3f/0x50 > [ 251.101967] [] start_worker+0x2a/0x40 > [ 251.101967] [] > create_and_start_worker+0x4d/0x90 > [ 251.101967] [] init_workqueues+0x192/0x3cb > [ 251.101967] [] do_one_initcall+0xca/0x1e0 > [ 251.101967] [] > kernel_init_freeable+0x2b4/0x354 > [ 251.101967] [] kernel_init+0xe/0x130 > [ 251.101967] [] ret_from_fork+0x7c/0xb0 > [ 251.101967] > -> #1 (&(&pool->lock)->rlock){-.-...}: > [ 251.101967] [] validate_chain+0x6c3/0x7b0 > [ 251.101967] [] __lock_acquire+0x4ad/0x580 > [ 251.101967] [] lock_acquire+0x182/0x1d0 > [ 251.101967] [] _raw_spin_lock+0x40/0x80 > [ 251.101967] [] __queue_work+0x1a9/0x3f0 > [ 251.101967] [] queue_work_on+0x98/0x120 > [ 251.101967] [] > clock_was_set_delayed+0x21/0x30 > [ 251.101967] [] do_adjtimex+0x111/0x160 > [ 251.101967] [] compat_sys_adjtimex+0x41/0x70 > [ 251.101967] [] ia32_sysret+0x0/0x5 > [ 251.101967] Hrm... So it looks like the clock_was_set_delayed() is no longer safe to call while holding the timekeeper lock. Which is problematic, since clock_was_set_delayed() was introduced to avoid exactly this sort of problem (timekeeper->hrtimer hrtimer->timekeeper circular deadlock). But at least yay for lockdep catching these sorts of things. Not sure if this is really new in -next or just preexisting and hard to trigger? I'll have to look more closely at it tomorrow after I've had some coffee. thanks -john