From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Johannes Weiner <hannes@cmpxchg.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Matthew Wilcox <matthew@wil.cx>, Chuck Lever <cel@citi.umich.edu>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Ingo Molnar <mingo@elte.hu>
Subject: [patch 03/53] wait: prevent exclusive waiter starvation
Date: Tue, 10 Feb 2009 10:59:46 -0800 [thread overview]
Message-ID: <20090210185946.GD14308@kroah.com> (raw)
In-Reply-To: <20090210185924.GA14308@kroah.com>
[-- Attachment #1: wait-prevent-exclusive-waiter-starvation.patch --]
[-- Type: text/plain, Size: 6806 bytes --]
2.6.28-stable review patch. If anyone has any objections, please let us know.
------------------
From: Johannes Weiner <hannes@cmpxchg.org>
commit 777c6c5f1f6e757ae49ecca2ed72d6b1f523c007 upstream.
With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.
Interruptible waiters don't do that when aborting due to a signal. And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.
This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently. The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.
Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock. Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.
Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.
[akpm@linux-foundation.org: coding-style fixes]
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Mentored-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
include/linux/wait.h | 11 +++++++--
kernel/sched.c | 4 +--
kernel/wait.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 63 insertions(+), 11 deletions(-)
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -132,6 +132,8 @@ static inline void __remove_wait_queue(w
list_del(&old->task_list);
}
+void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
+ int nr_exclusive, int sync, void *key);
void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
extern void __wake_up_locked(wait_queue_head_t *q, unsigned int mode);
extern void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr);
@@ -333,16 +335,19 @@ do { \
for (;;) { \
prepare_to_wait_exclusive(&wq, &__wait, \
TASK_INTERRUPTIBLE); \
- if (condition) \
+ if (condition) { \
+ finish_wait(&wq, &__wait); \
break; \
+ } \
if (!signal_pending(current)) { \
schedule(); \
continue; \
} \
ret = -ERESTARTSYS; \
+ abort_exclusive_wait(&wq, &__wait, \
+ TASK_INTERRUPTIBLE, NULL); \
break; \
} \
- finish_wait(&wq, &__wait); \
} while (0)
#define wait_event_interruptible_exclusive(wq, condition) \
@@ -431,6 +436,8 @@ extern long interruptible_sleep_on_timeo
void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
+ unsigned int mode, void *key);
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4586,8 +4586,8 @@ EXPORT_SYMBOL(default_wake_function);
* started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
* zero in this (rare) case, and we handle it by continuing to scan the queue.
*/
-static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
- int nr_exclusive, int sync, void *key)
+void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
+ int nr_exclusive, int sync, void *key)
{
wait_queue_t *curr, *next;
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -91,6 +91,15 @@ prepare_to_wait_exclusive(wait_queue_hea
}
EXPORT_SYMBOL(prepare_to_wait_exclusive);
+/*
+ * finish_wait - clean up after waiting in a queue
+ * @q: waitqueue waited on
+ * @wait: wait descriptor
+ *
+ * Sets current thread back to running state and removes
+ * the wait descriptor from the given waitqueue if still
+ * queued.
+ */
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
@@ -117,6 +126,39 @@ void finish_wait(wait_queue_head_t *q, w
}
EXPORT_SYMBOL(finish_wait);
+/*
+ * abort_exclusive_wait - abort exclusive waiting in a queue
+ * @q: waitqueue waited on
+ * @wait: wait descriptor
+ * @state: runstate of the waiter to be woken
+ * @key: key to identify a wait bit queue or %NULL
+ *
+ * Sets current thread back to running state and removes
+ * the wait descriptor from the given waitqueue if still
+ * queued.
+ *
+ * Wakes up the next waiter if the caller is concurrently
+ * woken up through the queue.
+ *
+ * This prevents waiter starvation where an exclusive waiter
+ * aborts and is woken up concurrently and noone wakes up
+ * the next waiter.
+ */
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
+ unsigned int mode, void *key)
+{
+ unsigned long flags;
+
+ __set_current_state(TASK_RUNNING);
+ spin_lock_irqsave(&q->lock, flags);
+ if (!list_empty(&wait->task_list))
+ list_del_init(&wait->task_list);
+ else if (waitqueue_active(q))
+ __wake_up_common(q, mode, 1, 0, key);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(abort_exclusive_wait);
+
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
int ret = default_wake_function(wait, mode, sync, key);
@@ -177,17 +219,20 @@ int __sched
__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
int (*action)(void *), unsigned mode)
{
- int ret = 0;
-
do {
+ int ret;
+
prepare_to_wait_exclusive(wq, &q->wait, mode);
- if (test_bit(q->key.bit_nr, q->key.flags)) {
- if ((ret = (*action)(q->key.flags)))
- break;
- }
+ if (!test_bit(q->key.bit_nr, q->key.flags))
+ continue;
+ ret = action(q->key.flags);
+ if (!ret)
+ continue;
+ abort_exclusive_wait(wq, &q->wait, mode, &q->key);
+ return ret;
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
finish_wait(wq, &q->wait);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(__wait_on_bit_lock);
next prev parent reply other threads:[~2009-02-10 19:04 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090210185337.000769713@mini.kroah.org>
2009-02-10 18:59 ` [patch 00/53] 2.6.28-stable review Greg KH
2009-02-10 18:59 ` [patch 01/53] sgi-xp: fix writing past the end of kzalloc()d space Greg KH
2009-02-10 18:59 ` [patch 02/53] do_wp_page: fix regression with execute in place Greg KH
2009-02-10 18:59 ` Greg KH [this message]
2009-02-10 18:59 ` [patch 04/53] shm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEM Greg KH
2009-02-10 18:59 ` [patch 05/53] revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" Greg KH
2009-02-10 18:59 ` [patch 06/53] prevent kprobes from catching spurious page faults Greg KH
2009-02-10 18:59 ` [patch 07/53] sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices Greg KH
2009-02-10 18:59 ` [patch 08/53] md: Ensure an md array never has too many devices Greg KH
2009-02-10 18:59 ` [patch 09/53] md: Fix a bug in linear.c causing which_dev() to return the wrong device Greg KH
2009-02-10 19:00 ` [patch 10/53] ACPI: Enable bit 11 in _PDC to advertise hw coord Greg KH
2009-02-10 19:00 ` [patch 11/53] ACPI: dock: Dont eval _STA on every show_docked sysfs read Greg KH
2009-02-10 19:00 ` [patch 12/53] ieee1394: ohci1394: increase AT req. retries, fix ack_busy_X from Panasonic camcorders and others Greg KH
2009-02-10 19:00 ` [patch 13/53] firewire: ohci: " Greg KH
2009-02-10 19:00 ` [patch 14/53] firewire: sbp2: fix DMA mapping leak on the failure path Greg KH
2009-02-10 19:00 ` [patch 15/53] firewire: sbp2: add workarounds for 2nd and 3rd generation iPods Greg KH
2009-02-10 19:00 ` [patch 16/53] ieee1394: " Greg KH
2009-02-10 19:00 ` [patch 17/53] module: remove over-zealous check in __module_get() Greg KH
2009-02-10 19:00 ` [patch 18/53] serial: RS485 ioctl structure uses __u32 include linux/types.h Greg KH
2009-02-10 19:00 ` [patch 19/53] x86: APIC: enable workaround on AMD Fam10h CPUs Greg KH
2009-02-10 19:00 ` [patch 20/53] eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init Greg KH
2009-02-10 19:00 ` [patch 21/53] eeepc-laptop: Add support for extended hotkeys Greg KH
2009-02-10 19:00 ` [patch 22/53] e1000: fix bug with shared interrupt during reset Greg KH
2009-02-10 19:00 ` [patch 23/53] e1000: Fix PCI enable to honor the need_ioport flag Greg KH
2009-02-10 19:00 ` [patch 24/53] agp/intel: add support for G41 chipset Greg KH
2009-02-10 19:00 ` [patch 25/53] agp/intel: Fix broken ® symbol in device name Greg KH
2009-02-10 19:00 ` [patch 26/53] ALSA: hda - Add quirk for FSC Amilo Xi2550 Greg KH
2009-02-10 19:00 ` [patch 27/53] ALSA: hda - Add missing COEF initialization for ALC887 Greg KH
2009-02-10 19:00 ` [patch 28/53] ALSA: hda - Add missing initialization for ALC272 Greg KH
2009-02-10 19:00 ` [patch 29/53] asus_acpi: Add R1F support Greg KH
2009-02-10 19:00 ` [patch 30/53] panasonic-laptop: fix X[ ARRAY_SIZE(X) ] Greg KH
2009-02-10 19:00 ` [patch 31/53] ACPI: Skip the first two elements in the _BCL package Greg KH
2009-02-10 19:00 ` [patch 32/53] ACPI: proc_dir_entry video/VGA already registered Greg KH
2009-02-10 19:00 ` [patch 33/53] ACPI: disable ACPI cleanly when bad RSDP found Greg KH
2009-02-10 19:00 ` [patch 34/53] ACPICA: Fix table entry truncation calculation Greg KH
2009-02-10 19:00 ` [patch 35/53] PCI: properly clean up ASPM link state on device remove Greg KH
2009-02-10 19:00 ` [patch 36/53] PCI: return error on failure to read PCI ROMs Greg KH
2009-02-10 19:01 ` [patch 37/53] seq_file: move traverse so it can be used from seq_read Greg KH
2009-02-10 19:01 ` [patch 38/53] seq_file: fix big-enough lseek() + read() Greg KH
2009-02-10 19:01 ` [patch 39/53] serial: set correct baud_base for Oxford Semiconductor Ltd EXSYS EX-41092 Dual 16950 Serial adapter Greg KH
2009-02-10 19:01 ` [patch 40/53] Add support for 8-port RS-232 MIC-3620 from advantech Greg KH
2009-02-10 19:01 ` [patch 41/53] mm: fix error case in mlock downgrade reversion Greg KH
2009-02-10 19:01 ` [patch 42/53] elf core dump: fix get_user use Greg KH
2009-02-10 19:01 ` [patch 43/53] ACPI: video: Fix reversed brightness behavior on ThinkPad SL series Greg KH
2009-02-10 19:01 ` [patch 44/53] ipw2200: fix scanning while associated Greg KH
2009-02-10 19:01 ` [patch 45/53] XFS: set b_error from bio error in xfs_buf_bio_end_io Greg KH
2009-02-10 19:01 ` [patch 46/53] Revert USB: option: add Pantech cards Greg KH
2009-02-10 19:01 ` [patch 47/53] USB: option: New mobile broadband modems to be supported Greg KH
2009-02-10 19:01 ` [patch 48/53] USB: new id for ti_usb_3410_5052 driver Greg KH
2009-02-10 19:01 ` [patch 49/53] USB: two more usb ids for ti_usb_3410_5052 Greg KH
2009-02-10 19:01 ` [patch 50/53] USB: usb-storage: add Pentax to the bad-vendor list Greg KH
2009-02-10 19:01 ` [patch 51/53] sata_via: Add VT8261 support Greg KH
2009-02-10 19:01 ` [patch 52/53] nbd: do not allow two clients at the same time Greg KH
2009-02-10 19:01 ` [patch 53/53] sctp: Fix another socket race during accept/peeloff Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090210185946.GD14308@kroah.com \
--to=gregkh@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=cel@citi.umich.edu \
--cc=chuckw@quantumlinux.com \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jake@lwn.net \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=mingo@elte.hu \
--cc=mkrufky@linuxtv.org \
--cc=nickpiggin@yahoo.com.au \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox