From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755030AbbJUNU0 (ORCPT ); Wed, 21 Oct 2015 09:20:26 -0400 Received: from mail-wi0-f171.google.com ([209.85.212.171]:34344 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754899AbbJUNUW (ORCPT ); Wed, 21 Oct 2015 09:20:22 -0400 To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli From: Luca Abeni Subject: lockdep-related warning in kernel/sched/deadline.c::find_lock_later_rq() Message-ID: <56279107.3010602@unitn.it> Date: Wed, 21 Oct 2015 15:20:07 +0200 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, after fixing task migrations for SCHED_DEADLINE, I started to see some lockdep-related warnings that look like this: [ 794.428081] WARNING: CPU: 1 PID: 0 at /home/luca/Src/GRUB/linux-reclaiming/kernel/locking/lockdep.c:3407 lock_release+0x3f4/0x440() [ 794.428439] releasing a pinned lock [ 794.428439] Modules linked in: [ 794.428439] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.3.0-rc5+ #32 [ 794.428439] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 [ 794.428439] ffffffff81c946d0 ffff880007b03cc8 ffffffff8135abd2 ffff880007b03d10 [ 794.428439] ffff880007b03d00 ffffffff810546f1 ffff880007b159d8 ffffffff81096663 [ 794.428439] ffff88000754c558 0000000000000001 ffff88000754bd80 ffff880007b03d60 [ 794.428439] Call Trace: [ 794.428439] [] dump_stack+0x4b/0x69 [ 794.428439] [] warn_slowpath_common+0x81/0xc0 [ 794.428439] [] ? find_lock_later_rq+0x103/0x180 [ 794.428439] [] warn_slowpath_fmt+0x47/0x50 [ 794.428439] [] lock_release+0x3f4/0x440 [ 794.428439] [] _raw_spin_unlock+0x1a/0x30 [ 794.428439] [] find_lock_later_rq+0x103/0x180 [ 794.428439] [] push_dl_task.part.33+0x61/0x190 [ 794.428439] [] dl_task_timer+0x194/0x2e0 [ 794.428439] [] ? task_woken_dl+0x60/0x60 [ 794.428439] [] __hrtimer_run_queues+0x107/0x470 [ 794.428439] [] ? hrtimer_interrupt+0x82/0x1b0 [ 794.428439] [] hrtimer_interrupt+0xa6/0x1b0 [ 794.428439] [] local_apic_timer_interrupt+0x30/0x60 [ 794.428439] [] smp_apic_timer_interrupt+0x38/0x50 [ 794.428439] [] apic_timer_interrupt+0x84/0x90 [ 794.428439] [] ? default_idle+0x1b/0x140 [ 794.428439] [] ? default_idle+0x19/0x140 [ 794.428439] [] arch_cpu_idle+0xa/0x10 [ 794.428439] [] default_idle_call+0x25/0x40 [ 794.428439] [] cpu_startup_entry+0x310/0x390 [ 794.428439] [] start_secondary+0xf3/0x100 [ 794.428439] ---[ end trace 1d35bf076299f814 ]--- (this happended on KVM, but I can trigger similar warnings on real hardware too). This was not happening before my last patch, because push_dl_task() never migrated any task, and always returned before calling find_lock_later_rq()... Now, if I understand correctly the issue is that dl_task_timer() does: rq = task_rq_lock(p, &flags); [...] if (has_pushable_dl_tasks(rq)) push_dl_task(rq); with task_rq_lock() that pins rq->lock and push_tl_task() that invokes find_lock_later_rq() that unlocks rq->lock() while it is pinned. I am not sure about how to fix this issue: as a first try, I did diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 142df26..5b1ba95 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -668,8 +668,11 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) * Queueing this task back might have overloaded rq, check if we need * to kick someone away. */ - if (has_pushable_dl_tasks(rq)) + if (has_pushable_dl_tasks(rq)) { + lockdep_unpin_lock(&rq->lock); push_dl_task(rq); + lockdep_pin_lock(&rq->lock); + } #endif unlock: This removes the warning, but I am not sure if it is the correct fix (is it valid to unpin rq->lock, here?). If someone can confirm that this is the correct approach, I'll test the patch a little bit more and then I'll send a properly signed-off patch; otherwise, if someone can suggest the correct approach I'll try it. Thanks, Luca