From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Rabin Vincent <rabin.vincent@axis.com>,
Tomeu Vizoso <tomeu.vizoso@gmail.com>,
Jesper Nilsson <jesper.nilsson@axis.com>
Subject: [PATCH 3.10 28/55] workqueue: fix hang involving racing cancel[_delayed]_work_sync()s for PREEMPT_NONE
Date: Tue, 24 Mar 2015 16:43:08 +0100 [thread overview]
Message-ID: <20150324154159.928848686@linuxfoundation.org> (raw)
In-Reply-To: <20150324154158.748418668@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo <tj@kernel.org>
commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream.
cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.
try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation. In that case, try_to_grab_pending() returns -ENOENT. In
this case, __cancel_work_timer() currently invokes flush_work(). The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping
Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority. Let's say task A just got woken
up from flush_work() by the completion of the target work item. If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing. This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.
task A task B worker
executing work
__cancel_work_timer()
try_to_grab_pending()
set work CANCELING
flush_work()
block for work completion
completion, wakes up A
__cancel_work_timer()
while (forever) {
try_to_grab_pending()
-ENOENT as work is being canceled
flush_work()
false as work is no longer executing
}
This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.
Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
v3: bit_waitqueue() can't be used for work items defined in vmalloc
area. Switched to custom wake function which matches the target
work item and exclusive wait and wakeup.
v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
the target bit waitqueue has wait_bit_queue's on it. Use
DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu
Vizoso.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/workqueue.h | 3 +-
kernel/workqueue.c | 56 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 54 insertions(+), 5 deletions(-)
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -71,7 +71,8 @@ enum {
/* data contains off-queue information when !WORK_STRUCT_PWQ */
WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT,
- WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE),
+ __WORK_OFFQ_CANCELING = WORK_OFFQ_FLAG_BASE,
+ WORK_OFFQ_CANCELING = (1 << __WORK_OFFQ_CANCELING),
/*
* When a work item is off queue, its high bits point to the last
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2861,19 +2861,57 @@ bool flush_work(struct work_struct *work
}
EXPORT_SYMBOL_GPL(flush_work);
+struct cwt_wait {
+ wait_queue_t wait;
+ struct work_struct *work;
+};
+
+static int cwt_wakefn(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+ struct cwt_wait *cwait = container_of(wait, struct cwt_wait, wait);
+
+ if (cwait->work != key)
+ return 0;
+ return autoremove_wake_function(wait, mode, sync, key);
+}
+
static bool __cancel_work_timer(struct work_struct *work, bool is_dwork)
{
+ static DECLARE_WAIT_QUEUE_HEAD(cancel_waitq);
unsigned long flags;
int ret;
do {
ret = try_to_grab_pending(work, is_dwork, &flags);
/*
- * If someone else is canceling, wait for the same event it
- * would be waiting for before retrying.
+ * If someone else is already canceling, wait for it to
+ * finish. flush_work() doesn't work for PREEMPT_NONE
+ * because we may get scheduled between @work's completion
+ * and the other canceling task resuming and clearing
+ * CANCELING - flush_work() will return false immediately
+ * as @work is no longer busy, try_to_grab_pending() will
+ * return -ENOENT as @work is still being canceled and the
+ * other canceling task won't be able to clear CANCELING as
+ * we're hogging the CPU.
+ *
+ * Let's wait for completion using a waitqueue. As this
+ * may lead to the thundering herd problem, use a custom
+ * wake function which matches @work along with exclusive
+ * wait and wakeup.
*/
- if (unlikely(ret == -ENOENT))
- flush_work(work);
+ if (unlikely(ret == -ENOENT)) {
+ struct cwt_wait cwait;
+
+ init_wait(&cwait.wait);
+ cwait.wait.func = cwt_wakefn;
+ cwait.work = work;
+
+ prepare_to_wait_exclusive(&cancel_waitq, &cwait.wait,
+ TASK_UNINTERRUPTIBLE);
+ if (work_is_canceling(work))
+ schedule();
+ finish_wait(&cancel_waitq, &cwait.wait);
+ }
} while (unlikely(ret < 0));
/* tell other tasks trying to grab @work to back off */
@@ -2882,6 +2920,16 @@ static bool __cancel_work_timer(struct w
flush_work(work);
clear_work_data(work);
+
+ /*
+ * Paired with prepare_to_wait() above so that either
+ * waitqueue_active() is visible here or !work_is_canceling() is
+ * visible there.
+ */
+ smp_mb();
+ if (waitqueue_active(&cancel_waitq))
+ __wake_up(&cancel_waitq, TASK_NORMAL, 1, work);
+
return ret;
}
next prev parent reply other threads:[~2015-03-24 16:13 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-24 15:42 [PATCH 3.10 00/55] 3.10.73-stable review Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 01/55] sparc32: destroy_context() and switch_mm() needs to disable interrupts Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 02/55] sparc: semtimedop() unreachable due to comparison error Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 03/55] sparc: perf: Remove redundant perf_pmu_{en|dis}able calls Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 04/55] sparc: perf: Make counting mode actually work Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 05/55] sparc: Touch NMI watchdog when walking cpus and calling printk Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 06/55] sparc64: Fix several bugs in memmove() Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 07/55] net: sysctl_net_core: check SNDBUF and RCVBUF for min length Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 08/55] rds: avoid potential stack overflow Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 09/55] inet_diag: fix possible overflow in inet_diag_dump_one_icsk() Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 10/55] caif: fix MSG_OOB test in caif_seqpkt_recvmsg() Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 11/55] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 12/55] Revert "net: cx82310_eth: use common match macro" Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 13/55] tcp: fix tcp fin memory accounting Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 14/55] net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 15/55] tcp: make connect() mem charging friendly Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 17/55] drm/radeon: do a posting read in evergreen_set_irq Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 18/55] drm/radeon: do a posting read in r100_set_irq Greg Kroah-Hartman
2015-03-24 15:42 ` [PATCH 3.10 19/55] drm/radeon: do a posting read in r600_set_irq Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 20/55] drm/radeon: do a posting read in si_set_irq Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 21/55] drm/radeon: do a posting read in rs600_set_irq Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 23/55] fuse: set stolen page uptodate Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 24/55] fuse: notify: dont move pages Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 25/55] virtio_console: init work unconditionally Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 26/55] Change email address for 8250_pci Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 27/55] can: add missing initialisations in CAN related skbuffs Greg Kroah-Hartman
2015-03-24 15:43 ` Greg Kroah-Hartman [this message]
2015-03-24 15:43 ` [PATCH 3.10 29/55] tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 30/55] spi: pl022: Fix race in giveback() leading to driver lock-up Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 31/55] ALSA: control: Add sanity checks for user ctl id name string Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 32/55] ALSA: hda - Fix built-in mic on Compaq Presario CQ60 Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 33/55] ALSA: hda - Dont access stereo amps for mono channel widgets Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 34/55] ALSA: hda - Set single_adc_amp flag for CS420x codecs Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 35/55] ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 36/55] ALSA: hda - Treat stereo-to-mono mix properly Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 37/55] regulator: Only enable disabled regulators on resume Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 38/55] regulator: core: Fix enable GPIO reference counting Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 39/55] nilfs2: fix deadlock of segment constructor during recovery Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 40/55] xen-pciback: limit guest control of command register Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 41/55] libsas: Fix Kernel Crash in smp_execute_task Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 42/55] crypto: aesni - fix memory usage in GCM decryption Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 43/55] x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 44/55] x86/fpu: Drop_fpu() should not assume that tsk equals current Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 45/55] x86/vdso: Fix the build on GCC5 Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 46/55] powerpc/smp: Wait until secondaries are active & online Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 47/55] ipvs: add missing ip_vs_pe_put in sync code Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 48/55] ipvs: rerouting to local clients is not needed anymore Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 49/55] ARM: at91: pm: fix at91rm9200 standby Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 50/55] target: Fix reference leak in target_get_sess_cmd() error path Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 51/55] iscsi-target: Avoid early conn_logout_comp for iser connections Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 52/55] target/pscsi: Fix NULL pointer dereference in get_device_type Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 53/55] target: Fix R_HOLDER bit usage for AllRegistrants Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 54/55] target: Allow AllRegistrants to re-RESERVE existing reservation Greg Kroah-Hartman
2015-03-24 15:43 ` [PATCH 3.10 55/55] target: Allow Write Exclusive non-reservation holders to READ Greg Kroah-Hartman
2015-03-25 2:34 ` [PATCH 3.10 00/55] 3.10.73-stable review Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150324154159.928848686@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=jesper.nilsson@axis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rabin.vincent@axis.com \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=tomeu.vizoso@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox