stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Catalin Iacob <iacobcatalin@gmail.com>,
	Dave Jones <davej@redhat.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: [PATCH 3.17 09/25] irq_work: Force raised irq work to run on irq work interrupt
Date: Mon, 13 Oct 2014 04:25:02 +0200	[thread overview]
Message-ID: <20141013022454.682378940@linuxfoundation.org> (raw)
In-Reply-To: <20141013022454.289398272@linuxfoundation.org>

3.17-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 76a33061b9323b7fdb220ae5fa116c10833ec22e upstream.

The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.

This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.

However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.

Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:

	Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
	CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
	Workqueue: btrfs-endio-write normal_work_helper [btrfs]
	 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
	 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
	 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
	Call Trace:
	 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
	 [<ffffffff8a7ef928>] panic+0xd4/0x207
	 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
	 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
	 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
	 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
	 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
	 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
	 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
	 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
	 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
	 [<ffffffff8a008268>] do_nmi+0xb8/0x100
	 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
	 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
	 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
	 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
	 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
	 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
	 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
	 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
	 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
	 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
	 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
	 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
	 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
	 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
	 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
	 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/irq_work.h |    1 +
 kernel/irq_work.c        |   15 +++++++++++++--
 kernel/time/timer.c      |    2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -39,6 +39,7 @@ bool irq_work_queue_on(struct irq_work *
 #endif
 
 void irq_work_run(void);
+void irq_work_tick(void);
 void irq_work_sync(struct irq_work *work);
 
 #ifdef CONFIG_IRQ_WORK
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -115,8 +115,10 @@ bool irq_work_needs_cpu(void)
 
 	raised = &__get_cpu_var(raised_list);
 	lazy = &__get_cpu_var(lazy_list);
-	if (llist_empty(raised) && llist_empty(lazy))
-		return false;
+
+	if (llist_empty(raised) || arch_irq_work_has_interrupt())
+		if (llist_empty(lazy))
+			return false;
 
 	/* All work should have been flushed before going offline */
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -171,6 +173,15 @@ void irq_work_run(void)
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
+void irq_work_tick(void)
+{
+	struct llist_head *raised = &__get_cpu_var(raised_list);
+
+	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
+		irq_work_run_list(raised);
+	irq_work_run_list(&__get_cpu_var(lazy_list));
+}
+
 /*
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1385,7 +1385,7 @@ void update_process_times(int user_tick)
 	rcu_check_callbacks(cpu, user_tick);
 #ifdef CONFIG_IRQ_WORK
 	if (in_irq())
-		irq_work_run();
+		irq_work_tick();
 #endif
 	scheduler_tick();
 	run_posix_cpu_timers(p);



  parent reply	other threads:[~2014-10-13  2:25 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-13  2:24 [PATCH 3.17 00/25] 3.17.1-stable review Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 01/25] ip6_gre: fix flowi6_proto value in xmit path Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 02/25] net: systemport: fix bcm_sysport_insert_tsb() Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 03/25] team: avoid race condition in scheduling delayed work Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 04/25] hyperv: Fix a bug in netvsc_send() Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 05/25] sctp: handle association restarts when the socket is closed Greg Kroah-Hartman
2014-10-13  2:24 ` [PATCH 3.17 06/25] 3c59x: fix bad split of cpu_to_le32(pci_map_single()) Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 07/25] net_sched: copy exts->type in tcf_exts_change() Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 08/25] irq_work: Introduce arch_irq_work_has_interrupt() Greg Kroah-Hartman
2014-10-13  2:25 ` Greg Kroah-Hartman [this message]
2014-10-13  2:25 ` [PATCH 3.17 10/25] x86: Tell irq work about self IPI support Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 11/25] arm: " Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 12/25] PCI: pciehp: Fix wait time in timeout message Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 13/25] uas: Add a quirk for rejecting ATA_12 and ATA_16 commands Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 14/25] uas: Add no-report-opcodes quirk Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 15/25] uas: Add US_FL_NO_ATA_1X quirk for Seagate (0bc2:ab20) drives Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 16/25] uas: Add another ASM1051 usb-id to the uas blacklist Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 17/25] usb: gadget: f_fs: signedness bug in __ffs_func_bind_do_descs() Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 18/25] Revert "usb: gadget: composite: dequeue cdev->req before free it in composite_dev_cleanup" Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 19/25] USB: serial: cp210x: added Ketra N1 wireless interface support Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 20/25] USB: cp210x: add support for Seluxit USB dongle Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 21/25] usb: musb: dsps: kill OTG timer on suspend Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 22/25] USB: Add device quirk for ASUS T100 Base Station keyboard Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 23/25] crypto: caam - fix addressing of struct member Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 24/25] driver/base/node: remove unnecessary kfree of node struct from unregister_one_node Greg Kroah-Hartman
2014-10-13  2:25 ` [PATCH 3.17 25/25] serial: 8250: Add Quark X1000 to 8250_pci.c Greg Kroah-Hartman
2014-10-13 15:19 ` [PATCH 3.17 00/25] 3.17.1-stable review Guenter Roeck
2014-10-14  2:39   ` Greg Kroah-Hartman
2014-10-14  3:26     ` Greg Kroah-Hartman
2014-10-14 10:37       ` Satoru Takeuchi
2014-10-13 15:44 ` Romain Francoise
2014-10-13 15:57   ` Guenter Roeck
2014-10-13 21:03   ` George Spelvin
2014-10-14  1:27     ` Greg KH
2014-10-13 20:35 ` Shuah Khan
2014-10-14  2:40   ` Greg Kroah-Hartman
     [not found] ` <20141013112834.GA10826@khazad-dum.debian.net>
     [not found]   ` <20141013082319.175d90bb@as>
     [not found]     ` <20141119172235.GA31260@kroah.com>
     [not found]       ` <20141119194414.GN5050@linux.vnet.ibm.com>
2014-11-19 20:19         ` Holger Hoffstätte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141013022454.682378940@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davej@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=iacobcatalin@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).