public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Subject: nohz fail (was: perf related boot hang.)
Date: Thu, 7 Aug 2014 11:03:33 +0200	[thread overview]
Message-ID: <20140807090333.GL19379@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140806194656.GA11570@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

On Wed, Aug 06, 2014 at 03:46:56PM -0400, Dave Jones wrote:
> This one happened during runtime, but I got a whole stack..
> 
> 
>  Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
>  CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
>  Workqueue: btrfs-endio-write normal_work_helper [btrfs]
>   ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
>   ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
>   ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
>  Call Trace:
>   <NMI>  [<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
>   [<ffffffff8a7ef928>] panic+0xd4/0x207
>   [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
>   [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
>   [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
>   [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
>   [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
>   [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
>   [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
>   [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
>   [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
>   [<ffffffff8a008268>] do_nmi+0xb8/0x100
>   [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0

Ok so that part is just the watchdog triggering, so the below part is
the screwy bit:

>   <<EOE>>  <IRQ>  [<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
>   [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
>   [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
>   [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
>   [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
>   [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
>   [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
>   [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
>   [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
>   [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
>   [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
>   [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
>   [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
>   [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
>   [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
>   [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
>   [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
>   [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

And that looks like someone trying to cancel a timer from a timer, I
guess that won't work, seeing how cancel will wait for the timer handler
completion etc.

This is because of the fallback irq_work_run() in the tick
(update_process_times).


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-08-07  9:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-06 14:36 perf related boot hang Dave Jones
2014-08-06 16:19 ` Peter Zijlstra
2014-08-06 16:23   ` Dave Jones
2014-08-06 19:46   ` Dave Jones
2014-08-07  9:03     ` Peter Zijlstra [this message]
2014-08-07 13:16       ` nohz fail (was: perf related boot hang.) Frederic Weisbecker
2014-08-11 20:09         ` Dave Jones
2014-08-20 20:31           ` Catalin Iacob
2014-08-21 14:56             ` Frederic Weisbecker
2014-08-22  6:01               ` Catalin Iacob
2014-08-22 14:00                 ` Dave Jones
2014-09-01 20:14                   ` Frederic Weisbecker
2014-09-02 13:41                     ` Dave Jones
2014-09-02 18:23                     ` Catalin Iacob
2014-09-04 20:07                       ` Catalin Iacob
2014-09-04 20:17                         ` Frederic Weisbecker
2014-09-04 21:05                           ` Catalin Iacob
2014-09-04 21:29                             ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140807090333.GL19379@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=davej@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox