From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755617AbbBPLwm (ORCPT ); Mon, 16 Feb 2015 06:52:42 -0500 Received: from forward11.mail.yandex.net ([95.108.130.93]:42373 "EHLO forward11.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755526AbbBPLwi (ORCPT ); Mon, 16 Feb 2015 06:52:38 -0500 From: Kirill Tkhai To: Fengguang Wu , Peter Zijlstra Cc: Ingo Molnar , LKP , "linux-kernel@vger.kernel.org" In-Reply-To: <20150216072038.GA17056@wfg-t540p.sh.intel.com> References: <20150216072038.GA17056@wfg-t540p.sh.intel.com> Subject: Re: [sched/deadline] kernel BUG at kernel/sched/deadline.c:805! MIME-Version: 1.0 Message-Id: <9846861424087552@web13j.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Mon, 16 Feb 2015 14:52:32 +0300 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 16.02.2015, 10:21, "Fengguang Wu" : > Hi Peter, > > We got the below dmesg and the first bad commit is > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git revert-d347efeb16d3d5150cb7f8d50b05f388b572840e-1019a359d3dc4b64d0e1e5a5efcb725d5e83994d > > commit 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d > Author: ššššPeter Zijlstra > AuthorDate: Wed Nov 26 08:44:03 2014 +0800 > Commit: ššššIngo Molnar > CommitDate: Wed Feb 4 07:52:26 2015 +0100 > > ššššsched/deadline: Fix stale yield state > > ššššWhen we fail to start the deadline timer in update_curr_dl(), we > ššššforget to clear ->dl_yielded, resulting in wrecked time keeping. > > ššššSince the natural place to clear both ->dl_yielded and ->dl_throttled > ššššis in replenish_dl_entity(); both are after all waiting for that event; > ššššmake it so. > > ššššLuckily since 67dfa1b756f2 ("sched/deadline: Implement > ššššcancel_dl_timer() to use in switched_from_dl()") the > šššštask_on_rq_queued() condition in dl_task_timer() must be true, and can > šššštherefore call enqueue_task_dl() unconditionally. > > ššššReported-by: Wanpeng Li > ššššSigned-off-by: Peter Zijlstra (Intel) > ššššCc: Kirill Tkhai > ššššCc: Juri Lelli > ššššCc: Linus Torvalds > ššššLink: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com > ššššSigned-off-by: Ingo Molnar > > =================================================== > PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT! > =================================================== > > +------------------------------------------------+------------+------------+------------+ > | ššššššššššššššššššššššššššššššššššššššššššššššš| a7bebf4887 | 1019a359d3 | 58e0e18cb8 | > +------------------------------------------------+------------+------------+------------+ > | boot_successes šššššššššššššššššššššššššššššššš| 886 ššššššš| 54 šššššššš| 10 šššššššš| > | boot_failures ššššššššššššššššššššššššššššššššš| 314 ššššššš| 246 ššššššš| 4 ššššššššš| > | BUG:kernel_boot_crashed ššššššššššššššššššššššš| 48 šššššššš| 20 šššššššš| 1 ššššššššš| > | BUG:kernel_test_crashed ššššššššššššššššššššššš| 16 šššššššš| 2 ššššššššš| ššššššššššš| > | BUG:kernel_boot_hang šššššššššššššššššššššššššš| 213 ššššššš| 43 šššššššš| ššššššššššš| > | Kernel_panic-not_syncing:softlockup:hung_tasks | 37 šššššššš| 3 ššššššššš| ššššššššššš| > | backtrace:ring_buffer_consumer_thread ššššššššš| 17 šššššššš| 3 ššššššššš| ššššššššššš| > | EIP_is_at_get_node_by_name šššššššššššššššššššš| 4 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:gcov_enable_events šššššššššššššššššš| 17 šššššššš| ššššššššššš| ššššššššššš| > | backtrace:gcov_fs_init šššššššššššššššššššššššš| 17 šššššššš| ššššššššššš| ššššššššššš| > | backtrace:kernel_init_freeable šššššššššššššššš| 20 šššššššš| 178 ššššššš| 3 ššššššššš| > | EIP_is_at_strcmp šššššššššššššššššššššššššššššš| 7 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at__slab_alloc ššššššššššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:__ring_buffer_alloc ššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:ring_buffer_benchmark_init šššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_lock_release šššššššššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:debug_mutex_init šššššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_strncmp ššššššššššššššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_do_raw_spin_unlock šššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at__do_softirq ššššššššššššššššššššššššš| 6 ššššššššš| 2 ššššššššš| ššššššššššš| > | EIP_is_at_gcov_info_filename šššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_ring_buffer_consume ššššššššššššššššš| 8 ššššššššš| 1 ššššššššš| ššššššššššš| > | EIP_is_at_raw_spin_unlock_irqrestore šššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_ftrace_ops_list_func šššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:event_trace_self_tests šššššššššššššš| 2 ššššššššš| ššššššššššš| ššššššššššš| > | backtrace:event_trace_self_tests_init ššššššššš| 2 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_ftrace_likely_update šššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_rb_reserve_next_event ššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | EIP_is_at_raw_spin_unlock_irq ššššššššššššššššš| 1 ššššššššš| ššššššššššš| ššššššššššš| > | kernel_BUG_at_kernel/sched/deadline.c ššššššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > | invalid_opcode šššššššššššššššššššššššššššššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > | EIP_is_at_enqueue_task_dl ššššššššššššššššššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > | Kernel_panic-not_syncing:Fatal_exception šššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > | backtrace:register_tracer ššššššššššššššššššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > | backtrace:init_wakeup_tracer šššššššššššššššššš| 0 ššššššššš| 178 ššššššš| 3 ššššššššš| > +------------------------------------------------+------------+------------+------------+ > > [ ššš6.432010] Testing ftrace regs: PASSED > [ ššš6.530393] Testing tracer wakeup: > [ ššš7.648615] ------------[ cut here ]------------ > [ ššš7.649000] kernel BUG at kernel/sched/deadline.c:805! > [ ššš7.649000] invalid opcode: 0000 [#1] SMP > [ ššš7.649000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-rc6-00041-g1019a35 #17 > [ ššš7.649000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [ ššš7.649000] task: d3348000 ti: d3344000 task.ti: d3344000 > [ ššš7.649000] EIP: 0060:[] EFLAGS: 00010046 CPU: 0 > [ ššš7.649000] EIP is at enqueue_task_dl+0x341/0x1860 > [ ššš7.649000] EAX: d2940101 EBX: 00000001 ECX: 00000000 EDX: 0085a491 > [ ššš7.649000] ESI: d2940000 EDI: d2940101 EBP: d3345e08 ESP: d3345dc0 > [ ššš7.649000] šDS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [ ššš7.649000] CR0: 8005003b CR2: ffffffff CR3: 04e8b000 CR4: 000006d0 > [ ššš7.649000] Stack: > [ ššš7.649000] šc1170353 00000002 d3345de0 c11ab53c d391ec10 00000046 d33485c0 d3348000 > [ ššš7.649000] š00000002 d29401b8 d391ec00 d29401b8 00000046 00000005 00000000 d2940000 > [ ššš7.649000] šd391ec00 c36efc80 d3345e20 c115991a 00000005 00000000 d391ec00 00000000 > [ ššš7.649000] Call Trace: > [ ššš7.649000] š[] ? sched_clock_cpu+0x293/0x2e0 > [ ššš7.649000] š[] ? __lock_is_held+0x6c/0xa0 > [ ššš7.649000] š[] enqueue_task+0xba/0x220 > [ ššš7.649000] š[] activate_task+0x4c/0x80 > [ ššš7.649000] š[] ttwu_activate+0x21/0x90 > [ ššš7.649000] š[] ttwu_do_activate+0x64/0x90 > [ ššš7.649000] š[] try_to_wake_up+0x366/0x8b0 > [ ššš7.649000] š[] wake_up_process+0xa0/0xe0 > [ ššš7.649000] š[] kthread_stop+0x21b/0xd90 > [ ššš7.649000] š[] trace_selftest_startup_wakeup+0x3b7/0x4b0 > [ ššš7.649000] š[] ? wait_for_completion+0x1f0/0x290 > [ ššš7.649000] š[] register_tracer+0x2d4/0x700 > [ ššš7.649000] š[] init_wakeup_tracer+0x1d/0xea > [ ššš7.649000] š[] do_one_initcall+0x278/0x44c > [ ššš7.649000] š[] ? init_function_trace+0x1f8/0x1f8 > [ ššš7.649000] š[] kernel_init_freeable+0x263/0x424 > [ ššš7.649000] š[] kernel_init+0x1c/0x2a0 > [ ššš7.649000] š[] ret_from_kernel_thread+0x21/0x30 > [ ššš7.649000] š[] ? rest_init+0x330/0x330 > [ ššš7.649000] Code: da e8 94 96 13 00 8b 04 9d 84 f6 77 c4 83 c0 01 89 04 9d 84 f6 77 c4 89 f8 84 c0 74 4d 83 05 c8 14 f2 c4 01 83 15 cc 14 f2 c4 00 <0f> 0b 83 05 d0 14 f2 c4 01 83 15 d4 14 f2 c4 00 8d b4 26 00 00 > [ ššš7.649000] EIP: [] enqueue_task_dl+0x341/0x1860 SS:ESP 0068:d3345dc0 > [ ššš7.649000] ---[ end trace cb4bdebe33135894 ]--- > [ ššš7.649000] Kernel panic - not syncing: Fatal exception > > git bisect start 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe bfa76d49576599a4b9f9b7a71f23d73d6dcff735 -- > git bisect šbad 30496a920e97b97e16d078c520eef323344ef997 š# 23:32 ššššš0- šššš67 šMerge 'amirv/for-upstream' into devel-xian-smoke-201502152116 > git bisect good d170f4905ee6f93eb12f87d018dd4b9acea18f2c š# 00:27 ššš300+ ššš120 š0day base guard for 'devel-xian-smoke-201502152116' > git bisect šbad c5ce28df0e7c01a1de23c36ebdefcd803f2b6cbb š# 01:11 ššššš0- šššš72 šMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect good 57ee062e4baa9532e5a3835a2d185590a3fc8bd3 š# 01:51 ššš300+ šššš94 šMerge branch 'r8152' > git bisect šbad c08f8467939e7d2eebcba7cf2330242c4f53f2f7 š# 02:10 ššššš0- ššš103 šMerge tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci > git bisect šbad 5b9b28a63f2e47dac5ff3a2503bfe3ade8796aa0 š# 02:24 ššššš0- ššššš5 šMerge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good 30d46827c2744f56bb31460007f2d16455f10720 š# 02:41 ššš300+ ššš107 šMerge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator > git bisect good 8308756f45a12e2ff4f7749c2694fc83cdef0be9 š# 03:11 ššš300+ ššš101 šMerge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good 4ac30cf74b308fb01338e660d3471cd490a7958a š# 03:36 ššš300+ šššš25 šperf tools: Do not use __perf_session__process_events() directly > git bisect šbad 9659e1eeee28f7025b6545934d644d19e9c6e603 š# 03:48 ššššš0- ššššš2 šsched/deadline: Remove cpu_active_mask from cpudl_find() > git bisect good 3847b272248a3a4ed70d20392cc0454917f7713b š# 04:23 ššš300+ šššš89 šMerge branch 'sched/urgent' into sched/core > git bisect good 4c195c8a1967ff8bee13a811518a99db04618ab7 š# 04:59 ššš300+ ššš118 šMerge branch 'sched/urgent' into sched/core, to merge fixes before applying new patches > git bisect šbad 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d š# 05:10 šššš39- ššš193 šsched/deadline: Fix stale yield state > git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb š# 05:34 ššš300+ ššš103 šsched/deadline: Fix hrtick for a non-leftmost task > # first bad commit: [1019a359d3dc4b64d0e1e5a5efcb725d5e83994d] sched/deadline: Fix stale yield state > git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb š# 05:55 ššš900+ ššš314 šsched/deadline: Fix hrtick for a non-leftmost task > # extra tests with DEBUG_INFO > git bisect good 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d š# 06:11 ššš900+ ššššš0 šsched/deadline: Fix stale yield state > # extra tests on HEAD of linux-devel/devel-xian-smoke-201502152116 > git bisect šbad 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe š# 06:11 ššššš0- ššššš4 š0day head guard for 'devel-xian-smoke-201502152116' > # extra tests on tree/branch linus/master > git bisect šbad a9724125ad014decf008d782e60447c811391326 š# 06:11 ššššš0- ššššš1 šMerge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > # extra tests with first bad commit reverted > # extra tests on tree/branch linus/master > git bisect šbad a9724125ad014decf008d782e60447c811391326 š# 06:16 ššššš0- ššššš1 šMerge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > # extra tests on tree/branch next/master > git bisect šbad b8acf73194186a5cba86812eb4ba17b897f0e13e š# 06:26 ššššš0- šššš35 šAdd linux-next specific files for 20150213 > > This script may reproduce the error. > > ---------------------------------------------------------------------------- > #!/bin/bash > > kernel=$1 > > kvm=( > ššššššššqemu-system-x86_64 > šššššššš-cpu kvm64 > šššššššš-enable-kvm > šššššššš-kernel $kernel > šššššššš-m 320 > šššššššš-smp 1 > šššššššš-net nic,vlan=1,model=e1000 > šššššššš-net user,vlan=1 > šššššššš-boot order=nc > šššššššš-no-reboot > šššššššš-watchdog i6300esb > šššššššš-rtc base=localtime > šššššššš-serial stdio > šššššššš-display none > šššššššš-monitor null > ) > > append=( > ššššššššhung_task_panic=1 > ššššššššearlyprintk=ttyS0,115200 > ššššššššrd.udev.log-priority=err > ššššššššsystemd.log_target=journal > ššššššššsystemd.log_level=warning > ššššššššdebug > ššššššššapic=debug > ššššššššsysrq_always_enabled > ššššššššrcupdate.rcu_cpu_stall_timeout=100 > ššššššššpanic=-1 > ššššššššsoftlockup_panic=1 > ššššššššnmi_watchdog=panic > ššššššššoops=panic > ššššššššload_ramdisk=2 > ššššššššprompt_ramdisk=0 > ššššššššconsole=ttyS0,115200 > ššššššššconsole=tty0 > ššššššššvga=normal > ššššššššroot=/dev/ram0 > ššššššššrw > ššššššššdrbd.minor_count=8 > ) > > "${kvm[@]}" --append "${append[*]}" > ---------------------------------------------------------------------------- Don't we have the following problem? current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() ...timer fires... dl_task_timer() enqueue_task_dl() /* queues on dl_rq */ ...try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Compile-tested fix: kernel/sched/deadline.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index a027799..1b78598 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer) sched_clock_tick(); update_rq_clock(rq); + if (!p->on_rq) { + replenish_dl_entity(dl_se, dl_se); + goto unlock; + } enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); if (dl_task(rq->curr)) check_preempt_curr_dl(rq, p, 0);