From: Peter Zijlstra <peterz@infradead.org>
To: Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Subject: nohz fail (was: perf related boot hang.)
Date: Thu, 7 Aug 2014 11:03:33 +0200 [thread overview]
Message-ID: <20140807090333.GL19379@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140806194656.GA11570@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]
On Wed, Aug 06, 2014 at 03:46:56PM -0400, Dave Jones wrote:
> This one happened during runtime, but I got a whole stack..
>
>
> Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
> CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
> Workqueue: btrfs-endio-write normal_work_helper [btrfs]
> ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
> ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
> ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
> Call Trace:
> <NMI> [<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
> [<ffffffff8a7ef928>] panic+0xd4/0x207
> [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
> [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
> [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
> [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
> [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
> [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
> [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
> [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
> [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
> [<ffffffff8a008268>] do_nmi+0xb8/0x100
> [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
Ok so that part is just the watchdog triggering, so the below part is
the screwy bit:
> <<EOE>> <IRQ> [<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
> [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
> [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
> [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
> [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
> [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
> [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
> [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
> [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
> [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
> [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
> [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
> [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
> [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
> [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
> [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
> [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
> [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80
And that looks like someone trying to cancel a timer from a timer, I
guess that won't work, seeing how cancel will wait for the timer handler
completion etc.
This is because of the fallback irq_work_run() in the tick
(update_process_times).
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
next prev parent reply other threads:[~2014-08-07 9:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-06 14:36 perf related boot hang Dave Jones
2014-08-06 16:19 ` Peter Zijlstra
2014-08-06 16:23 ` Dave Jones
2014-08-06 19:46 ` Dave Jones
2014-08-07 9:03 ` Peter Zijlstra [this message]
2014-08-07 13:16 ` nohz fail (was: perf related boot hang.) Frederic Weisbecker
2014-08-11 20:09 ` Dave Jones
2014-08-20 20:31 ` Catalin Iacob
2014-08-21 14:56 ` Frederic Weisbecker
2014-08-22 6:01 ` Catalin Iacob
2014-08-22 14:00 ` Dave Jones
2014-09-01 20:14 ` Frederic Weisbecker
2014-09-02 13:41 ` Dave Jones
2014-09-02 18:23 ` Catalin Iacob
2014-09-04 20:07 ` Catalin Iacob
2014-09-04 20:17 ` Frederic Weisbecker
2014-09-04 21:05 ` Catalin Iacob
2014-09-04 21:29 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140807090333.GL19379@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=davej@redhat.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox