From: Sasha Levin <sasha.levin@oracle.com>
To: Kirill Tkhai <tkhai@yandex.ru>,
Michael wang <wangyun@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
"ktkhai@parallels.com" <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: sched: hang in migrate_swap
Date: Mon, 12 May 2014 14:48:37 -0400 [thread overview]
Message-ID: <53711785.5010504@oracle.com> (raw)
In-Reply-To: <1442521397229373@web20m.yandex.ru>
On 04/11/2014 11:16 AM, Kirill Tkhai wrote:
> 11.04.2014, 18:33, "Sasha Levin" <sasha.levin@oracle.com>:
>> > On 04/10/2014 09:38 AM, Kirill Tkhai wrote:
>> >
>>> >> 10.04.2014, 11:00, "Michael wang" <wangyun@linux.vnet.ibm.com>:
>>>>> >>>> On 04/10/2014 11:31 AM, Sasha Levin wrote:
>>>>> >>>> [snip]
>>>>>>> >>>>>> I'd like to re-open this issue. It seems that something broke and I'm
>>>>>>> >>>>>> now seeing the same issues that have gone away 2 months with this patch
>>>>>>> >>>>>> again.
>>>>> >>>> A new mechanism has been designed to move the priority checking inside
>>>>> >>>> idle_balance(), including Kirill who is the designer ;-)
>>> >> Not sure, it's connected with my patch. But looks like, we forgot
>>> >> exactly stop class. Maybe this will help?
>>> >>
>>> >> [PATCH] sched: Checking for stop task appearance when balancing happens
>>> >>
>>> >> Just do it, like we do for other higher priority classes...
>>> >>
>>> >> Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
>> >
>> > I've been running with this patch for the last two days and the hang
>> > seems to be gone.
>> >
>> > I'll leave it running on the weekend and will update again on Monday.
> Thanks for testing.
>
> Sorry for I killed your linuxes by this lost need_resched().
So I'm going to revive this thread again:
[ 3912.086567] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [migration/4:194]
[ 3912.089060] Modules linked in:
[ 3912.090438] irq event stamp: 585772
[ 3912.090605] hardirqs last enabled at (585771): restore_args (arch/x86/kernel/entry_64.S:1033)
[ 3912.090605] hardirqs last disabled at (585772): trace_apic_timer_interrupt (arch/x86/kernel/entry_64.S:1225)
[ 3912.090605] softirqs last enabled at (585770): __do_softirq (arch/x86/include/asm/preempt.h:22 kernel/softirq.c:296)
[ 3912.090605] softirqs last disabled at (585763): irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
[ 3912.090605] CPU: 4 PID: 194 Comm: migration/4 Tainted: G W 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
[ 3912.090605] task: ffff8801445a8000 ti: ffff880144596000 task.ti: ffff880144596000
[ 3912.090605] RIP: multi_cpu_stop (kernel/stop_machine.c:210)
[ 3912.090605] RSP: 0000:ffff880144597cc8 EFLAGS: 00000293
[ 3912.090605] RAX: 0000000000000001 RBX: ffffffffa458d3fb RCX: 0000000000000034
[ 3912.090605] RDX: ffff8801445a8000 RSI: 0000000000000034 RDI: 0000000020000000
[ 3912.090605] RBP: ffff880144597d08 R08: 0000000000000001 R09: 0000000000000000
[ 3912.090605] R10: ffffffffa4638060 R11: 0000000000000001 R12: ffff880144597c38
[ 3912.090605] R13: 0000000000000001 R14: ffff880144596000 R15: ffff8801445a8000
[ 3912.090605] FS: 0000000000000000(0000) GS:ffff880144c00000(0000) knlGS:0000000000000000
[ 3912.090605] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3912.090605] CR2: 000000000258aff8 CR3: 000000002602c000 CR4: 00000000000006a0
[ 3912.090605] DR0: 00000000006df000 DR1: 00000000006df000 DR2: 0000000000000000
[ 3912.090605] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 3912.090605] Stack:
[ 3912.090605] ffff880144597ce8 0000000000000286 ffff880144597ce8 ffff88015666fa40
[ 3912.090605] ffff880144dd0a00 ffff88015666fa18 ffffffffa1227180 ffff880144dd0a50
[ 3912.090605] ffff880144597dd8 ffffffffa1226e42 ffff880144597d28 ffffffffa11c4aae
[ 3912.090605] Call Trace:
[ 3912.090605] ? queue_stop_cpus_work (kernel/stop_machine.c:171)
[ 3912.090605] cpu_stopper_thread (kernel/stop_machine.c:498)
[ 3912.090605] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[ 3912.090605] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 3912.090605] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 3912.090605] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 3912.090605] smpboot_thread_fn (kernel/smpboot.c:160)
[ 3912.090605] ? __smpboot_create_thread (kernel/smpboot.c:105)
[ 3912.090605] kthread (kernel/kthread.c:210)
[ 3912.090605] ? complete (kernel/sched/completion.c:35)
[ 3912.090605] ? kthread_create_on_node (kernel/kthread.c:176)
[ 3912.090605] ret_from_fork (arch/x86/kernel/entry_64.S:553)
[ 3912.090605] ? kthread_create_on_node (kernel/kthread.c:176)
[ 3912.090605] Code: 09 06 0f 1f 80 00 00 00 00 4d 0f a3 65 00 45 19 e4 45 85 e4 41 0f 95 c6 e8 bc 10 90 00 41 89 c4 45 31 ed 31 c0 0f 1f 40 00 f3 90 <45> 84 f6 75 11 44 8b 7b 20 41 39 c7 75 20 eb 67 66 0f 1f 44 00
[ 3912.090605] Kernel panic - not syncing: softlockup: hung tasks
[ 3912.090605] CPU: 4 PID: 194 Comm: migration/4 Tainted: G W 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
[ 3912.090605] ffff8801445a8000 ffff880144c03dd8 ffffffffa453e1ec 0000000000000001
[ 3912.090605] ffffffffa56e9ed0 ffff880144c03e58 ffffffffa452fd6a 0000000000000001
[ 3912.090605] 0000000000000008 ffff880144c03e68 ffff880144c03e08 0000000000000001
[ 3912.090605] Call Trace:
[ 3912.090605] <IRQ> dump_stack (lib/dump_stack.c:52)
[ 3912.090605] panic (kernel/panic.c:119)
[ 3912.090605] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 3912.090605] watchdog_timer_fn (kernel/watchdog.c:372)
[ 3912.090605] __run_hrtimer (kernel/hrtimer.c:1268 (discriminator 2))
[ 3912.090605] ? queue_stop_cpus_work (include/linux/cpumask.h:108 include/linux/cpumask.h:174 kernel/stop_machine.c:375)
[ 3912.090605] ? watchdog (kernel/hung_task.c:191 kernel/hung_task.c:232)
[ 3912.090605] hrtimer_interrupt (kernel/hrtimer.c:1915)
[ 3912.090605] ? queue_stop_cpus_work (include/linux/cpumask.h:108 include/linux/cpumask.h:174 kernel/stop_machine.c:375)
[ 3912.090605] ? queue_stop_cpus_work (include/linux/cpumask.h:108 include/linux/cpumask.h:174 kernel/stop_machine.c:375)
[ 3912.090605] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[ 3912.090605] smp_trace_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:964 include/linux/jump_label.h:105 arch/x86/include/asm/trace/irq_vectors.h:45 arch/x86/kernel/apic/apic.c:965)
[ 3912.090605] trace_apic_timer_interrupt (arch/x86/kernel/entry_64.S:1225)
[ 3912.090605] <EOI> ? retint_restore_args (arch/x86/kernel/entry_64.S:1033)
[ 3912.090605] ? multi_cpu_stop (kernel/stop_machine.c:210)
[ 3912.090605] ? multi_cpu_stop (include/linux/bitmap.h:280 include/linux/cpumask.h:461 kernel/stop_machine.c:189)
[ 3912.090605] ? queue_stop_cpus_work (kernel/stop_machine.c:171)
[ 3912.090605] cpu_stopper_thread (kernel/stop_machine.c:498)
[ 3912.090605] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[ 3912.090605] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 3912.090605] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 3912.090605] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 3912.090605] smpboot_thread_fn (kernel/smpboot.c:160)
[ 3912.090605] ? __smpboot_create_thread (kernel/smpboot.c:105)
[ 3912.090605] kthread (kernel/kthread.c:210)
[ 3912.090605] ? complete (kernel/sched/completion.c:35)
[ 3912.090605] ? kthread_create_on_node (kernel/kthread.c:176)
[ 3912.090605] ret_from_fork (arch/x86/kernel/entry_64.S:553)
[ 3912.090605] ? kthread_create_on_node (kernel/kthread.c:176)
[ 3912.090605] Dumping ftrace buffer:
[ 3912.090605] (ftrace buffer empty)
Nothing else is printed past this point.
Thanks,
Sasha
next prev parent reply other threads:[~2014-05-12 18:49 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-19 18:08 sched: hang in migrate_swap Sasha Levin
2014-02-20 4:32 ` Michael wang
2014-02-21 16:43 ` Sasha Levin
2014-02-22 1:45 ` Michael wang
2014-02-24 3:23 ` Sasha Levin
2014-02-24 5:19 ` Michael wang
2014-02-24 5:54 ` Sasha Levin
2014-02-24 7:10 ` Peter Zijlstra
2014-02-24 10:14 ` Michael wang
2014-02-24 12:12 ` Peter Zijlstra
2014-02-24 13:10 ` Peter Zijlstra
2014-02-25 4:47 ` Michael wang
2014-02-25 10:49 ` Peter Zijlstra
2014-02-26 2:32 ` Michael wang
2014-02-24 18:21 ` Sasha Levin
2014-02-25 2:48 ` Michael wang
2014-02-25 11:03 ` Peter Zijlstra
2014-02-25 3:01 ` Michael wang
2014-02-27 13:33 ` [tip:sched/core] sched: Guarantee task priority in pick_next_task () tip-bot for Peter Zijlstra
2014-04-10 3:31 ` sched: hang in migrate_swap Sasha Levin
2014-04-10 6:59 ` Michael wang
2014-04-10 13:38 ` Kirill Tkhai
2014-04-11 14:32 ` Sasha Levin
2014-04-11 15:16 ` Kirill Tkhai
2014-05-12 18:48 ` Sasha Levin [this message]
2014-05-14 9:42 ` Kirill Tkhai
2014-05-14 10:13 ` Peter Zijlstra
2014-05-14 10:21 ` Kirill Tkhai
2014-05-14 10:26 ` Peter Zijlstra
2014-05-14 11:20 ` Peter Zijlstra
2015-06-15 19:38 ` Rafael David Tinoco
2015-06-15 19:47 ` Peter Zijlstra
2014-04-18 8:24 ` [tip:sched/urgent] sched: Check for stop task appearance when balancing happens tip-bot for Kirill Tkhai
2014-04-10 7:42 ` sched: hang in migrate_swap Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53711785.5010504@oracle.com \
--to=sasha.levin@oracle.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tkhai@yandex.ru \
--cc=wangyun@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.