All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Subject: nohz fail (was: perf related boot hang.)
Date: Thu, 7 Aug 2014 11:03:33 +0200	[thread overview]
Message-ID: <20140807090333.GL19379@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140806194656.GA11570@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

On Wed, Aug 06, 2014 at 03:46:56PM -0400, Dave Jones wrote:
> This one happened during runtime, but I got a whole stack..
> 
> 
>  Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
>  CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
>  Workqueue: btrfs-endio-write normal_work_helper [btrfs]
>   ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
>   ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
>   ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
>  Call Trace:
>   <NMI>  [<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
>   [<ffffffff8a7ef928>] panic+0xd4/0x207
>   [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
>   [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
>   [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
>   [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
>   [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
>   [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
>   [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
>   [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
>   [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
>   [<ffffffff8a008268>] do_nmi+0xb8/0x100
>   [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
>   [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0

Ok so that part is just the watchdog triggering, so the below part is
the screwy bit:

>   <<EOE>>  <IRQ>  [<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
>   [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
>   [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
>   [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
>   [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
>   [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
>   [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
>   [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
>   [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
>   [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
>   [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
>   [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
>   [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
>   [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
>   [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
>   [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
>   [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
>   [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
>   [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

And that looks like someone trying to cancel a timer from a timer, I
guess that won't work, seeing how cancel will wait for the timer handler
completion etc.

This is because of the fallback irq_work_run() in the tick
(update_process_times).


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-08-07  9:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-06 14:36 perf related boot hang Dave Jones
2014-08-06 16:19 ` Peter Zijlstra
2014-08-06 16:23   ` Dave Jones
2014-08-06 19:46   ` Dave Jones
2014-08-07  9:03     ` Peter Zijlstra [this message]
2014-08-07 13:16       ` nohz fail (was: perf related boot hang.) Frederic Weisbecker
2014-08-11 20:09         ` Dave Jones
2014-08-20 20:31           ` Catalin Iacob
2014-08-21 14:56             ` Frederic Weisbecker
2014-08-22  6:01               ` Catalin Iacob
2014-08-22 14:00                 ` Dave Jones
2014-09-01 20:14                   ` Frederic Weisbecker
2014-09-02 13:41                     ` Dave Jones
2014-09-02 18:23                     ` Catalin Iacob
2014-09-04 20:07                       ` Catalin Iacob
2014-09-04 20:17                         ` Frederic Weisbecker
2014-09-04 21:05                           ` Catalin Iacob
2014-09-04 21:29                             ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140807090333.GL19379@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=davej@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.