From: Peter Zijlstra <peterz@infradead.org>
To: Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Subject: nohz fail (was: perf related boot hang.)
Date: Thu, 7 Aug 2014 11:03:33 +0200 [thread overview]
Message-ID: <20140807090333.GL19379@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140806194656.GA11570@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]
On Wed, Aug 06, 2014 at 03:46:56PM -0400, Dave Jones wrote:
> This one happened during runtime, but I got a whole stack..
>
>
> Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
> CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
> Workqueue: btrfs-endio-write normal_work_helper [btrfs]
> ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
> ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
> ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
> Call Trace:
> <NMI> [<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
> [<ffffffff8a7ef928>] panic+0xd4/0x207
> [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
> [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
> [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
> [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
> [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
> [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
> [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
> [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
> [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
> [<ffffffff8a008268>] do_nmi+0xb8/0x100
> [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
> [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
Ok so that part is just the watchdog triggering, so the below part is
the screwy bit:
> <<EOE>> <IRQ> [<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
> [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
> [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
> [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
> [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
> [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
> [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
> [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
> [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
> [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
> [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
> [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
> [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
> [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
> [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
> [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
> [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
> [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
> [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80
And that looks like someone trying to cancel a timer from a timer, I
guess that won't work, seeing how cancel will wait for the timer handler
completion etc.
This is because of the fallback irq_work_run() in the tick
(update_process_times).
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
next prev parent reply other threads:[~2014-08-07 9:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-06 14:36 perf related boot hang Dave Jones
2014-08-06 16:19 ` Peter Zijlstra
2014-08-06 16:23 ` Dave Jones
2014-08-06 19:46 ` Dave Jones
2014-08-07 9:03 ` Peter Zijlstra [this message]
2014-08-07 13:16 ` nohz fail (was: perf related boot hang.) Frederic Weisbecker
2014-08-11 20:09 ` Dave Jones
2014-08-20 20:31 ` Catalin Iacob
2014-08-21 14:56 ` Frederic Weisbecker
2014-08-22 6:01 ` Catalin Iacob
2014-08-22 14:00 ` Dave Jones
2014-09-01 20:14 ` Frederic Weisbecker
2014-09-02 13:41 ` Dave Jones
2014-09-02 18:23 ` Catalin Iacob
2014-09-04 20:07 ` Catalin Iacob
2014-09-04 20:17 ` Frederic Weisbecker
2014-09-04 21:05 ` Catalin Iacob
2014-09-04 21:29 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140807090333.GL19379@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=davej@redhat.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.