All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <tkhai@yandex.ru>
To: lkp@lists.01.org
Subject: Re: [sched/deadline] kernel BUG at kernel/sched/deadline.c:805!
Date: Mon, 16 Feb 2015 15:38:34 +0300	[thread overview]
Message-ID: <1374601424090314@web4j.yandex.ru> (raw)
In-Reply-To: <9846861424087552@web13j.yandex.ru>

[-- Attachment #1: Type: text/plain, Size: 18662 bytes --]

16.02.2015, 14:52, "Kirill Tkhai" <tkhai@yandex.ru>:
> 16.02.2015, 10:21, "Fengguang Wu" <fengguang.wu@intel.com>:
>>  Hi Peter,
>>
>>  We got the below dmesg and the first bad commit is
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git revert-d347efeb16d3d5150cb7f8d50b05f388b572840e-1019a359d3dc4b64d0e1e5a5efcb725d5e83994d
>>
>>  commit 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d
>>  Author:     Peter Zijlstra <peterz@infradead.org>
>>  AuthorDate: Wed Nov 26 08:44:03 2014 +0800
>>  Commit:     Ingo Molnar <mingo@kernel.org>
>>  CommitDate: Wed Feb 4 07:52:26 2015 +0100
>>
>>      sched/deadline: Fix stale yield state
>>
>>      When we fail to start the deadline timer in update_curr_dl(), we
>>      forget to clear ->dl_yielded, resulting in wrecked time keeping.
>>
>>      Since the natural place to clear both ->dl_yielded and ->dl_throttled
>>      is in replenish_dl_entity(); both are after all waiting for that event;
>>      make it so.
>>
>>      Luckily since 67dfa1b756f2 ("sched/deadline: Implement
>>      cancel_dl_timer() to use in switched_from_dl()") the
>>      task_on_rq_queued() condition in dl_task_timer() must be true, and can
>>      therefore call enqueue_task_dl() unconditionally.
>>
>>      Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>      Cc: Kirill Tkhai <ktkhai@parallels.com>
>>      Cc: Juri Lelli <juri.lelli@arm.com>
>>      Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>      Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li(a)linux.intel.com
>>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>
>>  ===================================================
>>  PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
>>  ===================================================
>>
>>  +------------------------------------------------+------------+------------+------------+
>>  |                                                | a7bebf4887 | 1019a359d3 | 58e0e18cb8 |
>>  +------------------------------------------------+------------+------------+------------+
>>  | boot_successes                                 | 886        | 54         | 10         |
>>  | boot_failures                                  | 314        | 246        | 4          |
>>  | BUG:kernel_boot_crashed                        | 48         | 20         | 1          |
>>  | BUG:kernel_test_crashed                        | 16         | 2          |            |
>>  | BUG:kernel_boot_hang                           | 213        | 43         |            |
>>  | Kernel_panic-not_syncing:softlockup:hung_tasks | 37         | 3          |            |
>>  | backtrace:ring_buffer_consumer_thread          | 17         | 3          |            |
>>  | EIP_is_at_get_node_by_name                     | 4          |            |            |
>>  | backtrace:gcov_enable_events                   | 17         |            |            |
>>  | backtrace:gcov_fs_init                         | 17         |            |            |
>>  | backtrace:kernel_init_freeable                 | 20         | 178        | 3          |
>>  | EIP_is_at_strcmp                               | 7          |            |            |
>>  | EIP_is_at__slab_alloc                          | 1          |            |            |
>>  | backtrace:__ring_buffer_alloc                  | 1          |            |            |
>>  | backtrace:ring_buffer_benchmark_init           | 1          |            |            |
>>  | EIP_is_at_lock_release                         | 1          |            |            |
>>  | backtrace:debug_mutex_init                     | 1          |            |            |
>>  | EIP_is_at_strncmp                              | 1          |            |            |
>>  | EIP_is_at_do_raw_spin_unlock                   | 1          |            |            |
>>  | EIP_is_at__do_softirq                          | 6          | 2          |            |
>>  | EIP_is_at_gcov_info_filename                   | 1          |            |            |
>>  | EIP_is_at_ring_buffer_consume                  | 8          | 1          |            |
>>  | EIP_is_at_raw_spin_unlock_irqrestore           | 1          |            |            |
>>  | EIP_is_at_ftrace_ops_list_func                 | 1          |            |            |
>>  | backtrace:event_trace_self_tests               | 2          |            |            |
>>  | backtrace:event_trace_self_tests_init          | 2          |            |            |
>>  | EIP_is_at_ftrace_likely_update                 | 1          |            |            |
>>  | EIP_is_at_rb_reserve_next_event                | 1          |            |            |
>>  | EIP_is_at_raw_spin_unlock_irq                  | 1          |            |            |
>>  | kernel_BUG_at_kernel/sched/deadline.c          | 0          | 178        | 3          |
>>  | invalid_opcode                                 | 0          | 178        | 3          |
>>  | EIP_is_at_enqueue_task_dl                      | 0          | 178        | 3          |
>>  | Kernel_panic-not_syncing:Fatal_exception       | 0          | 178        | 3          |
>>  | backtrace:register_tracer                      | 0          | 178        | 3          |
>>  | backtrace:init_wakeup_tracer                   | 0          | 178        | 3          |
>>  +------------------------------------------------+------------+------------+------------+
>>
>>  [    6.432010] Testing ftrace regs: PASSED
>>  [    6.530393] Testing tracer wakeup:
>>  [    7.648615] ------------[ cut here ]------------
>>  [    7.649000] kernel BUG at kernel/sched/deadline.c:805!
>>  [    7.649000] invalid opcode: 0000 [#1] SMP
>>  [    7.649000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-rc6-00041-g1019a35 #17
>>  [    7.649000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
>>  [    7.649000] task: d3348000 ti: d3344000 task.ti: d3344000
>>  [    7.649000] EIP: 0060:[<c11969a1>] EFLAGS: 00010046 CPU: 0
>>  [    7.649000] EIP is at enqueue_task_dl+0x341/0x1860
>>  [    7.649000] EAX: d2940101 EBX: 00000001 ECX: 00000000 EDX: 0085a491
>>  [    7.649000] ESI: d2940000 EDI: d2940101 EBP: d3345e08 ESP: d3345dc0
>>  [    7.649000]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>  [    7.649000] CR0: 8005003b CR2: ffffffff CR3: 04e8b000 CR4: 000006d0
>>  [    7.649000] Stack:
>>  [    7.649000]  c1170353 00000002 d3345de0 c11ab53c d391ec10 00000046 d33485c0 d3348000
>>  [    7.649000]  00000002 d29401b8 d391ec00 d29401b8 00000046 00000005 00000000 d2940000
>>  [    7.649000]  d391ec00 c36efc80 d3345e20 c115991a 00000005 00000000 d391ec00 00000000
>>  [    7.649000] Call Trace:
>>  [    7.649000]  [<c1170353>] ? sched_clock_cpu+0x293/0x2e0
>>  [    7.649000]  [<c11ab53c>] ? __lock_is_held+0x6c/0xa0
>>  [    7.649000]  [<c115991a>] enqueue_task+0xba/0x220
>>  [    7.649000]  [<c115aa4c>] activate_task+0x4c/0x80
>>  [    7.649000]  [<c115aaa1>] ttwu_activate+0x21/0x90
>>  [    7.649000]  [<c115bd84>] ttwu_do_activate+0x64/0x90
>>  [    7.649000]  [<c11689b6>] try_to_wake_up+0x366/0x8b0
>>  [    7.649000]  [<c1169000>] wake_up_process+0xa0/0xe0
>>  [    7.649000]  [<c114095b>] kthread_stop+0x21b/0xd90
>>  [    7.649000]  [<c12c0b27>] trace_selftest_startup_wakeup+0x3b7/0x4b0
>>  [    7.649000]  [<c36b1690>] ? wait_for_completion+0x1f0/0x290
>>  [    7.649000]  [<c12c1d04>] register_tracer+0x2d4/0x700
>>  [    7.649000]  [<c4bd11a4>] init_wakeup_tracer+0x1d/0xea
>>  [    7.649000]  [<c4b8f12e>] do_one_initcall+0x278/0x44c
>>  [    7.649000]  [<c4bd1187>] ? init_function_trace+0x1f8/0x1f8
>>  [    7.649000]  [<c4b8f565>] kernel_init_freeable+0x263/0x424
>>  [    7.649000]  [<c3663b4c>] kernel_init+0x1c/0x2a0
>>  [    7.649000]  [<c36c2181>] ret_from_kernel_thread+0x21/0x30
>>  [    7.649000]  [<c3663b30>] ? rest_init+0x330/0x330
>>  [    7.649000] Code: da e8 94 96 13 00 8b 04 9d 84 f6 77 c4 83 c0 01 89 04 9d 84 f6 77 c4 89 f8 84 c0 74 4d 83 05 c8 14 f2 c4 01 83 15 cc 14 f2 c4 00 <0f> 0b 83 05 d0 14 f2 c4 01 83 15 d4 14 f2 c4 00 8d b4 26 00 00
>>  [    7.649000] EIP: [<c11969a1>] enqueue_task_dl+0x341/0x1860 SS:ESP 0068:d3345dc0
>>  [    7.649000] ---[ end trace cb4bdebe33135894 ]---
>>  [    7.649000] Kernel panic - not syncing: Fatal exception
>>
>>  git bisect start 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe bfa76d49576599a4b9f9b7a71f23d73d6dcff735 --
>>  git bisect  bad 30496a920e97b97e16d078c520eef323344ef997  # 23:32      0-     67  Merge 'amirv/for-upstream' into devel-xian-smoke-201502152116
>>  git bisect good d170f4905ee6f93eb12f87d018dd4b9acea18f2c  # 00:27    300+    120  0day base guard for 'devel-xian-smoke-201502152116'
>>  git bisect  bad c5ce28df0e7c01a1de23c36ebdefcd803f2b6cbb  # 01:11      0-     72  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
>>  git bisect good 57ee062e4baa9532e5a3835a2d185590a3fc8bd3  # 01:51    300+     94  Merge branch 'r8152'
>>  git bisect  bad c08f8467939e7d2eebcba7cf2330242c4f53f2f7  # 02:10      0-    103  Merge tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
>>  git bisect  bad 5b9b28a63f2e47dac5ff3a2503bfe3ade8796aa0  # 02:24      0-      5  Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>  git bisect good 30d46827c2744f56bb31460007f2d16455f10720  # 02:41    300+    107  Merge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>  git bisect good 8308756f45a12e2ff4f7749c2694fc83cdef0be9  # 03:11    300+    101  Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>  git bisect good 4ac30cf74b308fb01338e660d3471cd490a7958a  # 03:36    300+     25  perf tools: Do not use __perf_session__process_events() directly
>>  git bisect  bad 9659e1eeee28f7025b6545934d644d19e9c6e603  # 03:48      0-      2  sched/deadline: Remove cpu_active_mask from cpudl_find()
>>  git bisect good 3847b272248a3a4ed70d20392cc0454917f7713b  # 04:23    300+     89  Merge branch 'sched/urgent' into sched/core
>>  git bisect good 4c195c8a1967ff8bee13a811518a99db04618ab7  # 04:59    300+    118  Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new patches
>>  git bisect  bad 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d  # 05:10     39-    193  sched/deadline: Fix stale yield state
>>  git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb  # 05:34    300+    103  sched/deadline: Fix hrtick for a non-leftmost task
>>  # first bad commit: [1019a359d3dc4b64d0e1e5a5efcb725d5e83994d] sched/deadline: Fix stale yield state
>>  git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb  # 05:55    900+    314  sched/deadline: Fix hrtick for a non-leftmost task
>>  # extra tests with DEBUG_INFO
>>  git bisect good 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d  # 06:11    900+      0  sched/deadline: Fix stale yield state
>>  # extra tests on HEAD of linux-devel/devel-xian-smoke-201502152116
>>  git bisect  bad 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe  # 06:11      0-      4  0day head guard for 'devel-xian-smoke-201502152116'
>>  # extra tests on tree/branch linus/master
>>  git bisect  bad a9724125ad014decf008d782e60447c811391326  # 06:11      0-      1  Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>  # extra tests with first bad commit reverted
>>  # extra tests on tree/branch linus/master
>>  git bisect  bad a9724125ad014decf008d782e60447c811391326  # 06:16      0-      1  Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>  # extra tests on tree/branch next/master
>>  git bisect  bad b8acf73194186a5cba86812eb4ba17b897f0e13e  # 06:26      0-     35  Add linux-next specific files for 20150213
>>
>>  This script may reproduce the error.
>>
>>  ----------------------------------------------------------------------------
>>  #!/bin/bash
>>
>>  kernel=$1
>>
>>  kvm=(
>>          qemu-system-x86_64
>>          -cpu kvm64
>>          -enable-kvm
>>          -kernel $kernel
>>          -m 320
>>          -smp 1
>>          -net nic,vlan=1,model=e1000
>>          -net user,vlan=1
>>          -boot order=nc
>>          -no-reboot
>>          -watchdog i6300esb
>>          -rtc base=localtime
>>          -serial stdio
>>          -display none
>>          -monitor null
>>  )
>>
>>  append=(
>>          hung_task_panic=1
>>          earlyprintk=ttyS0,115200
>>          rd.udev.log-priority=err
>>          systemd.log_target=journal
>>          systemd.log_level=warning
>>          debug
>>          apic=debug
>>          sysrq_always_enabled
>>          rcupdate.rcu_cpu_stall_timeout=100
>>          panic=-1
>>          softlockup_panic=1
>>          nmi_watchdog=panic
>>          oops=panic
>>          load_ramdisk=2
>>          prompt_ramdisk=0
>>          console=ttyS0,115200
>>          console=tty0
>>          vga=normal
>>          root=/dev/ram0
>>          rw
>>          drbd.minor_count=8
>>  )
>>
>>  "${kvm[@]}" --append "${append[*]}"
>>  ----------------------------------------------------------------------------
>
> Don't we have the following problem?
>
> current->state = TASK_INTERRUPTIBLE;
> schedule()
>     deactivate_task()
>         dequeue_task_dl()
>             update_curr_dl()
>                 start_dl_timer()
>             __dequeue_task_dl()
>
> ...timer fires...
>
> dl_task_timer()
>     enqueue_task_dl() /* queues on dl_rq */
>
> ...try_to_wake_up()
>
>    enqueue_dl_entity()
>        BUG_ON(on_dl_rq())
>
> Compile-tested fix:
>
>  kernel/sched/deadline.c | 4 ++++
>  1 file changed, 4 insertions(+)
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a027799..1b78598 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>
>          sched_clock_tick();
>          update_rq_clock(rq);
> + if (!p->on_rq) {
> + replenish_dl_entity(dl_se, dl_se);
> + goto unlock;
> + }
>          enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>          if (dl_task(rq->curr))
>                  check_preempt_curr_dl(rq, p, 0);

We shouldn't enqueue migrating tasks. Please, try this one instead ;)

[PATCH] sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()

A deadline task may be throttled and dequeued at the same time.
This happens, when it becomes throttled in schedule(), which
is called to go to sleep:

current->state = TASK_INTERRUPTIBLE;
schedule()
    deactivate_task()
        dequeue_task_dl()
            update_curr_dl()
                start_dl_timer()
            __dequeue_task_dl()
    prev->on_rq = 0;
    
Later the timer fires, but the task is still dequeued:
    
dl_task_timer()
    enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */

Someone wakes it up:

try_to_wake_up()

    enqueue_dl_entity()
        BUG_ON(on_dl_rq())

Patch fixes this problem, it prevents queueing !on_rq tasks
on dl_rq.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a027799..784ba68 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 
 	sched_clock_tick();
 	update_rq_clock(rq);
+	if (!task_on_rq_queued(p)) {
+		replenish_dl_entity(dl_se, dl_se);
+		goto unlock;
+	}
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
 	if (dl_task(rq->curr))
 		check_preempt_curr_dl(rq, p, 0);

WARNING: multiple messages have this Message-ID (diff)
From: Kirill Tkhai <tkhai@yandex.ru>
To: Fengguang Wu <fengguang.wu@intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>, LKP <lkp@01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [sched/deadline] kernel BUG at kernel/sched/deadline.c:805!
Date: Mon, 16 Feb 2015 15:38:34 +0300	[thread overview]
Message-ID: <1374601424090314@web4j.yandex.ru> (raw)
In-Reply-To: <9846861424087552@web13j.yandex.ru>

16.02.2015, 14:52, "Kirill Tkhai" <tkhai@yandex.ru>:
> 16.02.2015, 10:21, "Fengguang Wu" <fengguang.wu@intel.com>:
>>  Hi Peter,
>>
>>  We got the below dmesg and the first bad commit is
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git revert-d347efeb16d3d5150cb7f8d50b05f388b572840e-1019a359d3dc4b64d0e1e5a5efcb725d5e83994d
>>
>>  commit 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d
>>  Author:     Peter Zijlstra <peterz@infradead.org>
>>  AuthorDate: Wed Nov 26 08:44:03 2014 +0800
>>  Commit:     Ingo Molnar <mingo@kernel.org>
>>  CommitDate: Wed Feb 4 07:52:26 2015 +0100
>>
>>      sched/deadline: Fix stale yield state
>>
>>      When we fail to start the deadline timer in update_curr_dl(), we
>>      forget to clear ->dl_yielded, resulting in wrecked time keeping.
>>
>>      Since the natural place to clear both ->dl_yielded and ->dl_throttled
>>      is in replenish_dl_entity(); both are after all waiting for that event;
>>      make it so.
>>
>>      Luckily since 67dfa1b756f2 ("sched/deadline: Implement
>>      cancel_dl_timer() to use in switched_from_dl()") the
>>      task_on_rq_queued() condition in dl_task_timer() must be true, and can
>>      therefore call enqueue_task_dl() unconditionally.
>>
>>      Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>      Cc: Kirill Tkhai <ktkhai@parallels.com>
>>      Cc: Juri Lelli <juri.lelli@arm.com>
>>      Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>      Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com
>>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>
>>  ===================================================
>>  PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
>>  ===================================================
>>
>>  +------------------------------------------------+------------+------------+------------+
>>  |                                                | a7bebf4887 | 1019a359d3 | 58e0e18cb8 |
>>  +------------------------------------------------+------------+------------+------------+
>>  | boot_successes                                 | 886        | 54         | 10         |
>>  | boot_failures                                  | 314        | 246        | 4          |
>>  | BUG:kernel_boot_crashed                        | 48         | 20         | 1          |
>>  | BUG:kernel_test_crashed                        | 16         | 2          |            |
>>  | BUG:kernel_boot_hang                           | 213        | 43         |            |
>>  | Kernel_panic-not_syncing:softlockup:hung_tasks | 37         | 3          |            |
>>  | backtrace:ring_buffer_consumer_thread          | 17         | 3          |            |
>>  | EIP_is_at_get_node_by_name                     | 4          |            |            |
>>  | backtrace:gcov_enable_events                   | 17         |            |            |
>>  | backtrace:gcov_fs_init                         | 17         |            |            |
>>  | backtrace:kernel_init_freeable                 | 20         | 178        | 3          |
>>  | EIP_is_at_strcmp                               | 7          |            |            |
>>  | EIP_is_at__slab_alloc                          | 1          |            |            |
>>  | backtrace:__ring_buffer_alloc                  | 1          |            |            |
>>  | backtrace:ring_buffer_benchmark_init           | 1          |            |            |
>>  | EIP_is_at_lock_release                         | 1          |            |            |
>>  | backtrace:debug_mutex_init                     | 1          |            |            |
>>  | EIP_is_at_strncmp                              | 1          |            |            |
>>  | EIP_is_at_do_raw_spin_unlock                   | 1          |            |            |
>>  | EIP_is_at__do_softirq                          | 6          | 2          |            |
>>  | EIP_is_at_gcov_info_filename                   | 1          |            |            |
>>  | EIP_is_at_ring_buffer_consume                  | 8          | 1          |            |
>>  | EIP_is_at_raw_spin_unlock_irqrestore           | 1          |            |            |
>>  | EIP_is_at_ftrace_ops_list_func                 | 1          |            |            |
>>  | backtrace:event_trace_self_tests               | 2          |            |            |
>>  | backtrace:event_trace_self_tests_init          | 2          |            |            |
>>  | EIP_is_at_ftrace_likely_update                 | 1          |            |            |
>>  | EIP_is_at_rb_reserve_next_event                | 1          |            |            |
>>  | EIP_is_at_raw_spin_unlock_irq                  | 1          |            |            |
>>  | kernel_BUG_at_kernel/sched/deadline.c          | 0          | 178        | 3          |
>>  | invalid_opcode                                 | 0          | 178        | 3          |
>>  | EIP_is_at_enqueue_task_dl                      | 0          | 178        | 3          |
>>  | Kernel_panic-not_syncing:Fatal_exception       | 0          | 178        | 3          |
>>  | backtrace:register_tracer                      | 0          | 178        | 3          |
>>  | backtrace:init_wakeup_tracer                   | 0          | 178        | 3          |
>>  +------------------------------------------------+------------+------------+------------+
>>
>>  [    6.432010] Testing ftrace regs: PASSED
>>  [    6.530393] Testing tracer wakeup:
>>  [    7.648615] ------------[ cut here ]------------
>>  [    7.649000] kernel BUG at kernel/sched/deadline.c:805!
>>  [    7.649000] invalid opcode: 0000 [#1] SMP
>>  [    7.649000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-rc6-00041-g1019a35 #17
>>  [    7.649000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
>>  [    7.649000] task: d3348000 ti: d3344000 task.ti: d3344000
>>  [    7.649000] EIP: 0060:[<c11969a1>] EFLAGS: 00010046 CPU: 0
>>  [    7.649000] EIP is at enqueue_task_dl+0x341/0x1860
>>  [    7.649000] EAX: d2940101 EBX: 00000001 ECX: 00000000 EDX: 0085a491
>>  [    7.649000] ESI: d2940000 EDI: d2940101 EBP: d3345e08 ESP: d3345dc0
>>  [    7.649000]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>>  [    7.649000] CR0: 8005003b CR2: ffffffff CR3: 04e8b000 CR4: 000006d0
>>  [    7.649000] Stack:
>>  [    7.649000]  c1170353 00000002 d3345de0 c11ab53c d391ec10 00000046 d33485c0 d3348000
>>  [    7.649000]  00000002 d29401b8 d391ec00 d29401b8 00000046 00000005 00000000 d2940000
>>  [    7.649000]  d391ec00 c36efc80 d3345e20 c115991a 00000005 00000000 d391ec00 00000000
>>  [    7.649000] Call Trace:
>>  [    7.649000]  [<c1170353>] ? sched_clock_cpu+0x293/0x2e0
>>  [    7.649000]  [<c11ab53c>] ? __lock_is_held+0x6c/0xa0
>>  [    7.649000]  [<c115991a>] enqueue_task+0xba/0x220
>>  [    7.649000]  [<c115aa4c>] activate_task+0x4c/0x80
>>  [    7.649000]  [<c115aaa1>] ttwu_activate+0x21/0x90
>>  [    7.649000]  [<c115bd84>] ttwu_do_activate+0x64/0x90
>>  [    7.649000]  [<c11689b6>] try_to_wake_up+0x366/0x8b0
>>  [    7.649000]  [<c1169000>] wake_up_process+0xa0/0xe0
>>  [    7.649000]  [<c114095b>] kthread_stop+0x21b/0xd90
>>  [    7.649000]  [<c12c0b27>] trace_selftest_startup_wakeup+0x3b7/0x4b0
>>  [    7.649000]  [<c36b1690>] ? wait_for_completion+0x1f0/0x290
>>  [    7.649000]  [<c12c1d04>] register_tracer+0x2d4/0x700
>>  [    7.649000]  [<c4bd11a4>] init_wakeup_tracer+0x1d/0xea
>>  [    7.649000]  [<c4b8f12e>] do_one_initcall+0x278/0x44c
>>  [    7.649000]  [<c4bd1187>] ? init_function_trace+0x1f8/0x1f8
>>  [    7.649000]  [<c4b8f565>] kernel_init_freeable+0x263/0x424
>>  [    7.649000]  [<c3663b4c>] kernel_init+0x1c/0x2a0
>>  [    7.649000]  [<c36c2181>] ret_from_kernel_thread+0x21/0x30
>>  [    7.649000]  [<c3663b30>] ? rest_init+0x330/0x330
>>  [    7.649000] Code: da e8 94 96 13 00 8b 04 9d 84 f6 77 c4 83 c0 01 89 04 9d 84 f6 77 c4 89 f8 84 c0 74 4d 83 05 c8 14 f2 c4 01 83 15 cc 14 f2 c4 00 <0f> 0b 83 05 d0 14 f2 c4 01 83 15 d4 14 f2 c4 00 8d b4 26 00 00
>>  [    7.649000] EIP: [<c11969a1>] enqueue_task_dl+0x341/0x1860 SS:ESP 0068:d3345dc0
>>  [    7.649000] ---[ end trace cb4bdebe33135894 ]---
>>  [    7.649000] Kernel panic - not syncing: Fatal exception
>>
>>  git bisect start 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe bfa76d49576599a4b9f9b7a71f23d73d6dcff735 --
>>  git bisect  bad 30496a920e97b97e16d078c520eef323344ef997  # 23:32      0-     67  Merge 'amirv/for-upstream' into devel-xian-smoke-201502152116
>>  git bisect good d170f4905ee6f93eb12f87d018dd4b9acea18f2c  # 00:27    300+    120  0day base guard for 'devel-xian-smoke-201502152116'
>>  git bisect  bad c5ce28df0e7c01a1de23c36ebdefcd803f2b6cbb  # 01:11      0-     72  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
>>  git bisect good 57ee062e4baa9532e5a3835a2d185590a3fc8bd3  # 01:51    300+     94  Merge branch 'r8152'
>>  git bisect  bad c08f8467939e7d2eebcba7cf2330242c4f53f2f7  # 02:10      0-    103  Merge tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
>>  git bisect  bad 5b9b28a63f2e47dac5ff3a2503bfe3ade8796aa0  # 02:24      0-      5  Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>  git bisect good 30d46827c2744f56bb31460007f2d16455f10720  # 02:41    300+    107  Merge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
>>  git bisect good 8308756f45a12e2ff4f7749c2694fc83cdef0be9  # 03:11    300+    101  Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>  git bisect good 4ac30cf74b308fb01338e660d3471cd490a7958a  # 03:36    300+     25  perf tools: Do not use __perf_session__process_events() directly
>>  git bisect  bad 9659e1eeee28f7025b6545934d644d19e9c6e603  # 03:48      0-      2  sched/deadline: Remove cpu_active_mask from cpudl_find()
>>  git bisect good 3847b272248a3a4ed70d20392cc0454917f7713b  # 04:23    300+     89  Merge branch 'sched/urgent' into sched/core
>>  git bisect good 4c195c8a1967ff8bee13a811518a99db04618ab7  # 04:59    300+    118  Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new patches
>>  git bisect  bad 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d  # 05:10     39-    193  sched/deadline: Fix stale yield state
>>  git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb  # 05:34    300+    103  sched/deadline: Fix hrtick for a non-leftmost task
>>  # first bad commit: [1019a359d3dc4b64d0e1e5a5efcb725d5e83994d] sched/deadline: Fix stale yield state
>>  git bisect good a7bebf488791aa1036f3e6629daf01d01f705dcb  # 05:55    900+    314  sched/deadline: Fix hrtick for a non-leftmost task
>>  # extra tests with DEBUG_INFO
>>  git bisect good 1019a359d3dc4b64d0e1e5a5efcb725d5e83994d  # 06:11    900+      0  sched/deadline: Fix stale yield state
>>  # extra tests on HEAD of linux-devel/devel-xian-smoke-201502152116
>>  git bisect  bad 58e0e18cb8b1c3351cb70b3cf78d4e3785a181fe  # 06:11      0-      4  0day head guard for 'devel-xian-smoke-201502152116'
>>  # extra tests on tree/branch linus/master
>>  git bisect  bad a9724125ad014decf008d782e60447c811391326  # 06:11      0-      1  Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>  # extra tests with first bad commit reverted
>>  # extra tests on tree/branch linus/master
>>  git bisect  bad a9724125ad014decf008d782e60447c811391326  # 06:16      0-      1  Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>  # extra tests on tree/branch next/master
>>  git bisect  bad b8acf73194186a5cba86812eb4ba17b897f0e13e  # 06:26      0-     35  Add linux-next specific files for 20150213
>>
>>  This script may reproduce the error.
>>
>>  ----------------------------------------------------------------------------
>>  #!/bin/bash
>>
>>  kernel=$1
>>
>>  kvm=(
>>          qemu-system-x86_64
>>          -cpu kvm64
>>          -enable-kvm
>>          -kernel $kernel
>>          -m 320
>>          -smp 1
>>          -net nic,vlan=1,model=e1000
>>          -net user,vlan=1
>>          -boot order=nc
>>          -no-reboot
>>          -watchdog i6300esb
>>          -rtc base=localtime
>>          -serial stdio
>>          -display none
>>          -monitor null
>>  )
>>
>>  append=(
>>          hung_task_panic=1
>>          earlyprintk=ttyS0,115200
>>          rd.udev.log-priority=err
>>          systemd.log_target=journal
>>          systemd.log_level=warning
>>          debug
>>          apic=debug
>>          sysrq_always_enabled
>>          rcupdate.rcu_cpu_stall_timeout=100
>>          panic=-1
>>          softlockup_panic=1
>>          nmi_watchdog=panic
>>          oops=panic
>>          load_ramdisk=2
>>          prompt_ramdisk=0
>>          console=ttyS0,115200
>>          console=tty0
>>          vga=normal
>>          root=/dev/ram0
>>          rw
>>          drbd.minor_count=8
>>  )
>>
>>  "${kvm[@]}" --append "${append[*]}"
>>  ----------------------------------------------------------------------------
>
> Don't we have the following problem?
>
> current->state = TASK_INTERRUPTIBLE;
> schedule()
>     deactivate_task()
>         dequeue_task_dl()
>             update_curr_dl()
>                 start_dl_timer()
>             __dequeue_task_dl()
>
> ...timer fires...
>
> dl_task_timer()
>     enqueue_task_dl() /* queues on dl_rq */
>
> ...try_to_wake_up()
>
>    enqueue_dl_entity()
>        BUG_ON(on_dl_rq())
>
> Compile-tested fix:
>
>  kernel/sched/deadline.c | 4 ++++
>  1 file changed, 4 insertions(+)
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a027799..1b78598 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>
>          sched_clock_tick();
>          update_rq_clock(rq);
> + if (!p->on_rq) {
> + replenish_dl_entity(dl_se, dl_se);
> + goto unlock;
> + }
>          enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>          if (dl_task(rq->curr))
>                  check_preempt_curr_dl(rq, p, 0);

We shouldn't enqueue migrating tasks. Please, try this one instead ;)

[PATCH] sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()

A deadline task may be throttled and dequeued at the same time.
This happens, when it becomes throttled in schedule(), which
is called to go to sleep:

current->state = TASK_INTERRUPTIBLE;
schedule()
    deactivate_task()
        dequeue_task_dl()
            update_curr_dl()
                start_dl_timer()
            __dequeue_task_dl()
    prev->on_rq = 0;
    
Later the timer fires, but the task is still dequeued:
    
dl_task_timer()
    enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */

Someone wakes it up:

try_to_wake_up()

    enqueue_dl_entity()
        BUG_ON(on_dl_rq())

Patch fixes this problem, it prevents queueing !on_rq tasks
on dl_rq.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a027799..784ba68 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -541,6 +541,10 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 
 	sched_clock_tick();
 	update_rq_clock(rq);
+	if (!task_on_rq_queued(p)) {
+		replenish_dl_entity(dl_se, dl_se);
+		goto unlock;
+	}
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
 	if (dl_task(rq->curr))
 		check_preempt_curr_dl(rq, p, 0);

  reply	other threads:[~2015-02-16 12:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-16  7:20 [sched/deadline] kernel BUG at kernel/sched/deadline.c:805! Fengguang Wu
2015-02-16  7:20 ` Fengguang Wu
2015-02-16 11:52 ` Kirill Tkhai
2015-02-16 11:52   ` Kirill Tkhai
2015-02-16 12:38   ` Kirill Tkhai [this message]
2015-02-16 12:38     ` Kirill Tkhai
2015-02-16 13:08     ` Peter Zijlstra
2015-02-16 13:08       ` Peter Zijlstra
2015-02-16 14:44       ` Peter Zijlstra
2015-02-16 14:44         ` Peter Zijlstra
2015-02-16 15:12         ` Juri Lelli
2015-02-16 15:12           ` Juri Lelli
2015-02-18 17:06     ` [tip:sched/core] sched/dl: Prevent enqueue of a sleeping task in dl_task_timer() tip-bot for Kirill Tkhai
2015-02-16 12:49 ` [sched/deadline] kernel BUG at kernel/sched/deadline.c:805! Peter Zijlstra
2015-02-16 12:49   ` Peter Zijlstra
2015-02-16 13:25   ` Fengguang Wu
2015-02-16 13:25     ` Fengguang Wu
2015-02-16 13:29     ` Peter Zijlstra
2015-02-16 13:29       ` Peter Zijlstra
2015-02-16 13:37     ` Fengguang Wu
2015-02-16 13:37       ` Fengguang Wu
2015-02-18 18:00       ` Ingo Molnar
2015-02-18 18:00         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1374601424090314@web4j.yandex.ru \
    --to=tkhai@yandex.ru \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.