public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
	Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
	Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Matthew Wilcox <matthew@wil.cx>, Chuck Lever <cel@citi.umich.edu>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Ingo Molnar <mingo@elte.hu>
Subject: [patch 15/56] wait: prevent exclusive waiter starvation
Date: Tue, 10 Feb 2009 16:24:45 -0800	[thread overview]
Message-ID: <20090211002445.GP14660@kroah.com> (raw)
In-Reply-To: <20090211002328.GA14660@kroah.com>

[-- Attachment #1: wait-prevent-exclusive-waiter-starvation.patch --]
[-- Type: text/plain, Size: 6808 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.
------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 777c6c5f1f6e757ae49ecca2ed72d6b1f523c007 upstream.

With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.

Interruptible waiters don't do that when aborting due to a signal.  And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.

This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently.  The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.

Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock.  Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.

Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.

[akpm@linux-foundation.org: coding-style fixes]
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Mentored-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/wait.h |   11 +++++++--
 kernel/sched.c       |    4 +--
 kernel/wait.c        |   59 ++++++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 63 insertions(+), 11 deletions(-)

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -141,6 +141,8 @@ static inline void __remove_wait_queue(w
 	list_del(&old->task_list);
 }
 
+void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
+			int nr_exclusive, int sync, void *key);
 void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
 extern void __wake_up_locked(wait_queue_head_t *q, unsigned int mode);
 extern void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr);
@@ -342,16 +344,19 @@ do {									\
 	for (;;) {							\
 		prepare_to_wait_exclusive(&wq, &__wait,			\
 					TASK_INTERRUPTIBLE);		\
-		if (condition)						\
+		if (condition) {					\
+			finish_wait(&wq, &__wait);			\
 			break;						\
+		}							\
 		if (!signal_pending(current)) {				\
 			schedule();					\
 			continue;					\
 		}							\
 		ret = -ERESTARTSYS;					\
+		abort_exclusive_wait(&wq, &__wait, 			\
+				TASK_INTERRUPTIBLE, NULL);		\
 		break;							\
 	}								\
-	finish_wait(&wq, &__wait);					\
 } while (0)
 
 #define wait_event_interruptible_exclusive(wq, condition)		\
@@ -440,6 +445,8 @@ extern long interruptible_sleep_on_timeo
 void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
+			unsigned int mode, void *key);
 int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4556,8 +4556,8 @@ EXPORT_SYMBOL(default_wake_function);
  * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
  * zero in this (rare) case, and we handle it by continuing to scan the queue.
  */
-static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
-			     int nr_exclusive, int sync, void *key)
+void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
+			int nr_exclusive, int sync, void *key)
 {
 	wait_queue_t *curr, *next;
 
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -101,6 +101,15 @@ prepare_to_wait_exclusive(wait_queue_hea
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
 
+/*
+ * finish_wait - clean up after waiting in a queue
+ * @q: waitqueue waited on
+ * @wait: wait descriptor
+ *
+ * Sets current thread back to running state and removes
+ * the wait descriptor from the given waitqueue if still
+ * queued.
+ */
 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
 {
 	unsigned long flags;
@@ -127,6 +136,39 @@ void finish_wait(wait_queue_head_t *q, w
 }
 EXPORT_SYMBOL(finish_wait);
 
+/*
+ * abort_exclusive_wait - abort exclusive waiting in a queue
+ * @q: waitqueue waited on
+ * @wait: wait descriptor
+ * @state: runstate of the waiter to be woken
+ * @key: key to identify a wait bit queue or %NULL
+ *
+ * Sets current thread back to running state and removes
+ * the wait descriptor from the given waitqueue if still
+ * queued.
+ *
+ * Wakes up the next waiter if the caller is concurrently
+ * woken up through the queue.
+ *
+ * This prevents waiter starvation where an exclusive waiter
+ * aborts and is woken up concurrently and noone wakes up
+ * the next waiter.
+ */
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
+			unsigned int mode, void *key)
+{
+	unsigned long flags;
+
+	__set_current_state(TASK_RUNNING);
+	spin_lock_irqsave(&q->lock, flags);
+	if (!list_empty(&wait->task_list))
+		list_del_init(&wait->task_list);
+	else if (waitqueue_active(q))
+		__wake_up_common(q, mode, 1, 0, key);
+	spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(abort_exclusive_wait);
+
 int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
 	int ret = default_wake_function(wait, mode, sync, key);
@@ -187,17 +229,20 @@ int __sched
 __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 			int (*action)(void *), unsigned mode)
 {
-	int ret = 0;
-
 	do {
+		int ret;
+
 		prepare_to_wait_exclusive(wq, &q->wait, mode);
-		if (test_bit(q->key.bit_nr, q->key.flags)) {
-			if ((ret = (*action)(q->key.flags)))
-				break;
-		}
+		if (!test_bit(q->key.bit_nr, q->key.flags))
+			continue;
+		ret = action(q->key.flags);
+		if (!ret)
+			continue;
+		abort_exclusive_wait(wq, &q->wait, mode, &q->key);
+		return ret;
 	} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
 	finish_wait(wq, &q->wait);
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL(__wait_on_bit_lock);
 


  parent reply	other threads:[~2009-02-11  0:33 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090211001439.873435357@mini.kroah.org>
2009-02-11  0:23 ` [patch 00/56] 2.6.27-stable review Greg KH
2009-02-11  0:24   ` [patch 01/56] ACPI: dock: Dont eval _STA on every show_docked sysfs read Greg KH
2009-02-11  0:24   ` [patch 02/56] ACPI: Enable bit 11 in _PDC to advertise hw coord Greg KH
2009-02-11  0:24   ` [patch 03/56] agp/intel: add support for G41 chipset Greg KH
2009-02-11  0:24   ` [patch 04/56] agp/intel: Fix broken ® symbol in device name Greg KH
2009-02-11  0:24   ` [patch 05/56] agp/intel: Reduce extraneous PCI posting reads during init Greg KH
2009-02-11  2:13     ` Keith Packard
2009-02-11  2:40       ` Greg KH
2009-02-11  0:24   ` [patch 06/56] e1000: fix bug with shared interrupt during reset Greg KH
2009-02-11  0:24   ` [patch 07/56] e1000: Fix PCI enable to honor the need_ioport flag Greg KH
2009-02-11  0:24   ` [patch 08/56] eeepc-laptop: fix oops when changing backlight brightness during eeepc-laptop init Greg KH
2009-02-11  0:24   ` [patch 09/56] md: Ensure an md array never has too many devices Greg KH
2009-02-11  0:24   ` [patch 10/56] module: remove over-zealous check in __module_get() Greg KH
2009-02-11  0:24   ` [patch 11/56] prevent kprobes from catching spurious page faults Greg KH
2009-02-11  0:24   ` [patch 12/56] sgi-xp: fix writing past the end of kzalloc()d space Greg KH
2009-02-11  0:24   ` [patch 13/56] shm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEM Greg KH
2009-02-11  0:24   ` [patch 14/56] sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices Greg KH
2009-02-11  0:24   ` Greg KH [this message]
2009-02-11  0:24   ` [patch 16/56] x86: APIC: enable workaround on AMD Fam10h CPUs Greg KH
2009-02-11  0:24   ` [patch 17/56] ieee1394: ohci1394: increase AT req. retries, fix ack_busy_X from Panasonic camcorders and others Greg KH
2009-02-11  0:24   ` [patch 18/56] firewire: ohci: " Greg KH
2009-02-11  0:24   ` [patch 19/56] firewire: sbp2: fix DMA mapping leak on the failure path Greg KH
2009-02-11  0:24   ` [patch 20/56] firewire: sbp2: add workarounds for 2nd and 3rd generation iPods Greg KH
2009-02-11  0:24   ` [patch 21/56] ieee1394: " Greg KH
2009-02-11  0:25   ` [patch 22/56] 8250_pci: add support for netmos 9835 IBM devices Greg KH
2009-02-11  0:25   ` [patch 23/56] ACPICA: Fix table entry truncation calculation Greg KH
2009-02-11  0:25   ` [patch 24/56] ACPI: disable ACPI cleanly when bad RSDP found Greg KH
2009-02-11  0:25   ` [patch 25/56] ACPI: proc_dir_entry video/VGA already registered Greg KH
2009-02-11  0:25   ` [patch 26/56] ACPI: Skip the first two elements in the _BCL package Greg KH
2009-02-11  0:25   ` [patch 27/56] Add support for 8-port RS-232 MIC-3620 from advantech Greg KH
2009-02-11  0:25   ` [patch 28/56] ALSA: hda - Add missing COEF initialization for ALC887 Greg KH
2009-02-11  0:25   ` [patch 29/56] ALSA: hda - Add missing initialization for ALC272 Greg KH
2009-02-11  0:25   ` [patch 30/56] ALSA: hda - Add quirk for FSC Amilo Xi2550 Greg KH
2009-02-11  0:25   ` [patch 31/56] PCI: properly clean up ASPM link state on device remove Greg KH
2009-02-11  0:25   ` [patch 32/56] PCI: return error on failure to read PCI ROMs Greg KH
2009-02-11  0:25   ` [patch 33/56] seq_file: move traverse so it can be used from seq_read Greg KH
2009-02-11  0:25   ` [patch 34/56] seq_file: fix big-enough lseek() + read() Greg KH
2009-02-11  0:25   ` [patch 35/56] serial: set correct baud_base for Oxford Semiconductor Ltd EXSYS EX-41092 Dual 16950 Serial adapter Greg KH
2009-02-11  0:25   ` [patch 36/56] elf core dump: fix get_user use Greg KH
2009-02-11  0:25   ` [patch 37/56] XFS: set b_error from bio error in xfs_buf_bio_end_io Greg KH
2009-02-11  0:25   ` [patch 38/56] Add a reference to sunrpc in svc_addsock Greg KH
2009-02-11  0:25   ` [patch 39/56] mm: remove UP version of lru_add_drain_all() Greg KH
2009-02-11  0:25   ` [patch 40/56] Revert "vt: fix background color on line feed" Greg KH
2009-02-11  0:25   ` [patch 41/56] md: Dont try to set an array to read-auto if it is already in that state Greg KH
2009-02-11  0:25   ` [patch 42/56] md: Allow metadata_version to be updated for externally managed metadata Greg KH
2009-02-11  0:25   ` [patch 43/56] ipw2200: fix scanning while associated Greg KH
2009-02-11  0:25   ` [patch 44/56] hso: rfkill type should be WWAN Greg KH
2009-02-11  0:25   ` [patch 45/56] dm mpath: avoid attempting to activate null path Greg KH
2009-02-11  0:25   ` [patch 46/56] ACPICA: Copy dynamically loaded tables to local buffer Greg KH
2009-02-11  0:25   ` [patch 47/56] ACPICA: Add function to dereference returned reference objects Greg KH
2009-02-11  0:25   ` [patch 48/56] ACPI: dont load acpi_cpufreq if acpi=off Greg KH
2009-02-11  0:25   ` [patch 49/56] ACPI: video: Fix reversed brightness behavior on ThinkPad SL series Greg KH
2009-02-11  0:25   ` [patch 50/56] Revert USB: option: add Pantech cards Greg KH
2009-02-11  0:25   ` [patch 51/56] USB: new id for ti_usb_3410_5052 driver Greg KH
2009-02-11  0:26   ` [patch 52/56] USB: option: New mobile broadband modems to be supported Greg KH
2009-02-11  0:26   ` [patch 53/56] USB: two more usb ids for ti_usb_3410_5052 Greg KH
2009-02-11  0:26   ` [patch 54/56] USB: usb-storage: add Pentax to the bad-vendor list Greg KH
2009-02-11  0:26   ` [patch 55/56] sctp: Fix another socket race during accept/peeloff Greg KH
2009-02-11  0:26   ` [patch 56/56] genirq: NULL struct irq_descs member name in dynamic_irq_cleanup() Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090211002445.GP14660@kroah.com \
    --to=gregkh@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=cavokz@gmail.com \
    --cc=cebbert@redhat.com \
    --cc=cel@citi.umich.edu \
    --cc=chuckw@quantumlinux.com \
    --cc=davej@redhat.com \
    --cc=eteo@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jake@lwn.net \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=mingo@elte.hu \
    --cc=mkrufky@linuxtv.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rbranco@la.checkpoint.com \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=w@1wt.eu \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox