linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
	Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>
Subject: [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7
Date: Thu, 22 May 2014 11:40:51 +0100	[thread overview]
Message-ID: <20140522104051.GE23991@suse.de> (raw)
In-Reply-To: <20140522072001.GP30445@twins.programming.kicks-ass.net>

Changelog since v6
o Optimisation when PG_waiters is not available	(peterz)
o Documentation

Changelog since v5
o __always_inline where appropriate		(peterz)
o Documentation					(akpm)

Changelog since v4
o Remove dependency on io_schedule_timeout
o Push waiting logic down into waitqueue

From: Nick Piggin <npiggin@suse.de>

This patch introduces a new page flag for 64-bit capable machines,
PG_waiters, to signal there are *potentially* processes waiting on
PG_lock or PG_writeback.  If there are no possible waiters then we avoid
barriers, a waitqueue hash lookup and a failed wake_up in the unlock_page
and end_page_writeback paths. There is no guarantee that waiters exist if
PG_waiters is set as multiple pages can hash to the same waitqueue and we
cannot accurately detect if a waking process is the last waiter without
a reference count. When this happens, the bit is left set and a future
unlock or writeback completion will lookup the waitqueue and clear the
bit when there are no collisions. This adds a few branches to the fast
path but avoids bouncing a dirty cache line between CPUs. 32-bit machines
always take the slow path but the primary motivation for this patch is
large machines so I do not think that is a concern.

The test case used to evaluate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. After each
dd there is a sync so the reported times do not vary much. By measuring
the time it takes to do async the impact of page_waitqueue overhead for
async IO is highlighted.

The test machine was single socket and UMA to avoid any scheduling or
NUMA artifacts. The performance results are reported based on a run with
no profiling.  Profile data is based on a separate run with oprofile running.

async dd
                                 3.15.0-rc5            3.15.0-rc5
                                      mmotm           lockpage
btrfs Max      ddtime      0.5863 (  0.00%)      0.5621 (  4.14%)
ext3  Max      ddtime      1.4870 (  0.00%)      1.4609 (  1.76%)
ext4  Max      ddtime      1.0440 (  0.00%)      1.0376 (  0.61%)
tmpfs Max      ddtime      0.3541 (  0.00%)      0.3486 (  1.54%)
xfs   Max      ddtime      0.4995 (  0.00%)      0.4834 (  3.21%)

A separate run with profiles showed this

     samples percentage
ext3  225851    2.3180  vmlinux-3.15.0-rc5-mmotm       test_clear_page_writeback
ext3  106848    1.0966  vmlinux-3.15.0-rc5-mmotm       __wake_up_bit
ext3   71849    0.7374  vmlinux-3.15.0-rc5-mmotm       page_waitqueue
ext3   40319    0.4138  vmlinux-3.15.0-rc5-mmotm       unlock_page
ext3   26243    0.2693  vmlinux-3.15.0-rc5-mmotm       end_page_writeback
ext3  178777    1.7774  vmlinux-3.15.0-rc5-lockpage test_clear_page_writeback
ext3   67702    0.6731  vmlinux-3.15.0-rc5-lockpage unlock_page
ext3   22357    0.2223  vmlinux-3.15.0-rc5-lockpage end_page_writeback
ext3   11131    0.1107  vmlinux-3.15.0-rc5-lockpage __wake_up_bit
ext3    6360    0.0632  vmlinux-3.15.0-rc5-lockpage __wake_up_page_bit
ext3    1660    0.0165  vmlinux-3.15.0-rc5-lockpage page_waitqueue

The profiles show a clear reduction in waitqueue and wakeup functions. Note
that end_page_writeback costs the same as the savings there are due
to reduced calls to __wake_up_bit and page_waitqueue so there is no
obvious direct savings. The cost of unlock_page is higher as it's checking
PageWaiters but it is offset by reduced numbers of calls to page_waitqueue
and _wake_up_bit. There is a similar story told for each of the filesystems.
Note that for workloads that contend heavily on the page lock that
unlock_page may increase in cost as it has to clear PG_waiters so while
the typical case should be much faster, the worst case costs are now higher.

This is also reflected in the time taken to mmap a range of pages.
These are the results for xfs only but the other filesystems tell a
similar story.

                       3.15.0-rc5            3.15.0-rc5
                            mmotm           lockpage
Procs 107M     423.0000 (  0.00%)    409.0000 (  3.31%)
Procs 214M     847.0000 (  0.00%)    823.0000 (  2.83%)
Procs 322M    1296.0000 (  0.00%)   1232.0000 (  4.94%)
Procs 429M    1692.0000 (  0.00%)   1644.0000 (  2.84%)
Procs 536M    2137.0000 (  0.00%)   2057.0000 (  3.74%)
Procs 644M    2542.0000 (  0.00%)   2472.0000 (  2.75%)
Procs 751M    2953.0000 (  0.00%)   2872.0000 (  2.74%)
Procs 859M    3360.0000 (  0.00%)   3310.0000 (  1.49%)
Procs 966M    3770.0000 (  0.00%)   3724.0000 (  1.22%)
Procs 1073M   4220.0000 (  0.00%)   4114.0000 (  2.51%)
Procs 1181M   4638.0000 (  0.00%)   4546.0000 (  1.98%)
Procs 1288M   5038.0000 (  0.00%)   4940.0000 (  1.95%)
Procs 1395M   5481.0000 (  0.00%)   5431.0000 (  0.91%)
Procs 1503M   5940.0000 (  0.00%)   5832.0000 (  1.82%)
Procs 1610M   6316.0000 (  0.00%)   6204.0000 (  1.77%)
Procs 1717M   6749.0000 (  0.00%)   6799.0000 ( -0.74%)
Procs 1825M   7323.0000 (  0.00%)   7082.0000 (  3.29%)
Procs 1932M   7694.0000 (  0.00%)   7452.0000 (  3.15%)
Procs 2040M   8079.0000 (  0.00%)   7927.0000 (  1.88%)
Procs 2147M   8495.0000 (  0.00%)   8360.0000 (  1.59%)

   samples percentage
xfs  78334    1.3089  vmlinux-3.15.0-rc5-mmotm          page_waitqueue
xfs  55910    0.9342  vmlinux-3.15.0-rc5-mmotm          unlock_page
xfs  45120    0.7539  vmlinux-3.15.0-rc5-mmotm          __wake_up_bit
xfs  41414    0.6920  vmlinux-3.15.0-rc5-mmotm          test_clear_page_writeback
xfs   4823    0.0806  vmlinux-3.15.0-rc5-mmotm          end_page_writeback
xfs 100864    1.7063  vmlinux-3.15.0-rc5-lockpage    unlock_page
xfs  52547    0.8889  vmlinux-3.15.0-rc5-lockpage    test_clear_page_writeback
xfs   5031    0.0851  vmlinux-3.15.0-rc5-lockpage    end_page_writeback
xfs   1938    0.0328  vmlinux-3.15.0-rc5-lockpage    __wake_up_bit
xfs      9   1.5e-04  vmlinux-3.15.0-rc5-lockpage    __wake_up_page_bit
xfs      7   1.2e-04  vmlinux-3.15.0-rc5-lockpage    page_waitqueue

[jack@suse.cz: Fix add_page_wait_queue]
[mhocko@suse.cz: Use sleep_on_page_killable in __wait_on_page_locked_killable]
[steiner@sgi.com: Do not update struct page unnecessarily]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/page-flags.h |  18 ++++++
 include/linux/wait.h       |   8 +++
 kernel/sched/wait.c        | 145 +++++++++++++++++++++++++++++++++++----------
 mm/filemap.c               |  25 ++++----
 mm/page_alloc.c            |   1 +
 mm/swap.c                  |  12 ++++
 mm/vmscan.c                |   7 +++
 7 files changed, 173 insertions(+), 43 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 7baf0fe..b697e4f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -87,6 +87,7 @@ enum pageflags {
 	PG_private_2,		/* If pagecache, has fs aux data */
 	PG_writeback,		/* Page is under writeback */
 #ifdef CONFIG_PAGEFLAGS_EXTENDED
+	PG_waiters,		/* Page has PG_locked waiters. */
 	PG_head,		/* A head page */
 	PG_tail,		/* A tail page */
 #else
@@ -213,6 +214,22 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 
 __PAGEFLAG(SlobFree, slob_free)
 
+#ifdef CONFIG_PAGEFLAGS_EXTENDED
+PAGEFLAG(Waiters, waiters) __CLEARPAGEFLAG(Waiters, waiters)
+	TESTCLEARFLAG(Waiters, waiters)
+#define __PG_WAITERS		(1 << PG_waiters)
+#else
+/* Always fallback to slow path on 32-bit */
+static inline bool PageWaiters(struct page *page)
+{
+	return true;
+}
+static inline void __ClearPageWaiters(struct page *page) {}
+static inline void ClearPageWaiters(struct page *page) {}
+static inline void SetPageWaiters(struct page *page) {}
+#define __PG_WAITERS		0
+#endif /* CONFIG_PAGEFLAGS_EXTENDED */
+
 /*
  * Private page markings that may be used by the filesystem that owns the page
  * for its own purposes.
@@ -509,6 +526,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page)
 	 1 << PG_writeback | 1 << PG_reserved | \
 	 1 << PG_slab	 | 1 << PG_swapcache | 1 << PG_active | \
 	 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
+	 __PG_WAITERS | \
 	 __PG_COMPOUND_LOCK)
 
 /*
diff --git a/include/linux/wait.h b/include/linux/wait.h
index bd68819..9226724 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -141,14 +141,21 @@ __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
 	list_del(&old->task_list);
 }
 
+struct page;
+
 void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
 void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key);
 void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
 void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr);
 void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr);
 void __wake_up_bit(wait_queue_head_t *, void *, int);
+void __wake_up_page_bit(wait_queue_head_t *, struct page *page, void *, int);
 int __wait_on_bit(wait_queue_head_t *, struct wait_bit_queue *, int (*)(void *), unsigned);
+int __wait_on_page_bit(wait_queue_head_t *, struct wait_bit_queue *,
+				struct page *page, int (*)(void *), unsigned);
 int __wait_on_bit_lock(wait_queue_head_t *, struct wait_bit_queue *, int (*)(void *), unsigned);
+int __wait_on_page_bit_lock(wait_queue_head_t *, struct wait_bit_queue *,
+				struct page *page, int (*)(void *), unsigned);
 void wake_up_bit(void *, int);
 void wake_up_atomic_t(atomic_t *);
 int out_of_line_wait_on_bit(void *, int, int (*)(void *), unsigned);
@@ -822,6 +829,7 @@ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
 long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
+void finish_wait_page(wait_queue_head_t *q, wait_queue_t *wait, struct page *page);
 void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key);
 int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
 int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 0ffa20a..73cb8c6 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -167,31 +167,47 @@ EXPORT_SYMBOL_GPL(__wake_up_sync);	/* For internal use only */
  * stops them from bleeding out - it would still allow subsequent
  * loads to move into the critical region).
  */
-void
-prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
+static __always_inline void
+__prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait,
+			struct page *page, int state, bool exclusive)
 {
 	unsigned long flags;
 
-	wait->flags &= ~WQ_FLAG_EXCLUSIVE;
 	spin_lock_irqsave(&q->lock, flags);
-	if (list_empty(&wait->task_list))
-		__add_wait_queue(q, wait);
+
+	/*
+	 * pages are hashed on a waitqueue that is expensive to lookup.
+	 * __wait_on_page_bit and __wait_on_page_bit_lock pass in a page
+	 * to set PG_waiters here. A PageWaiters() can then be used at
+	 * unlock time or when writeback completes to detect if there
+	 * are any potential waiters that justify a lookup.
+	 */
+	if (page && !PageWaiters(page))
+		SetPageWaiters(page);
+	if (list_empty(&wait->task_list)) {
+		if (exclusive) {
+			wait->flags |= WQ_FLAG_EXCLUSIVE;
+			__add_wait_queue_tail(q, wait);
+		} else {
+			wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+			__add_wait_queue(q, wait);
+		}
+	}
 	set_current_state(state);
 	spin_unlock_irqrestore(&q->lock, flags);
 }
+
+void
+prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
+{
+	return __prepare_to_wait(q, wait, NULL, state, false);
+}
 EXPORT_SYMBOL(prepare_to_wait);
 
 void
 prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
 {
-	unsigned long flags;
-
-	wait->flags |= WQ_FLAG_EXCLUSIVE;
-	spin_lock_irqsave(&q->lock, flags);
-	if (list_empty(&wait->task_list))
-		__add_wait_queue_tail(q, wait);
-	set_current_state(state);
-	spin_unlock_irqrestore(&q->lock, flags);
+	return __prepare_to_wait(q, wait, NULL, state, true);
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
 
@@ -219,16 +235,8 @@ long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
 }
 EXPORT_SYMBOL(prepare_to_wait_event);
 
-/**
- * finish_wait - clean up after waiting in a queue
- * @q: waitqueue waited on
- * @wait: wait descriptor
- *
- * Sets current thread back to running state and removes
- * the wait descriptor from the given waitqueue if still
- * queued.
- */
-void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
+static __always_inline void __finish_wait(wait_queue_head_t *q,
+			wait_queue_t *wait, struct page *page)
 {
 	unsigned long flags;
 
@@ -249,9 +257,33 @@ void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
 	if (!list_empty_careful(&wait->task_list)) {
 		spin_lock_irqsave(&q->lock, flags);
 		list_del_init(&wait->task_list);
+
+		/*
+		 * Clear PG_waiters if the waitqueue is no longer active. There
+		 * is no guarantee that a page with no waiters will get cleared
+		 * as there may be unrelated pages hashed to sleep on the same
+		 * queue. Accurate detection would require a counter but
+		 * collisions are expected to be rare.
+		 */
+		if (page && !waitqueue_active(q))
+			ClearPageWaiters(page);
 		spin_unlock_irqrestore(&q->lock, flags);
 	}
 }
+
+/**
+ * finish_wait - clean up after waiting in a queue
+ * @q: waitqueue waited on
+ * @wait: wait descriptor
+ *
+ * Sets current thread back to running state and removes
+ * the wait descriptor from the given waitqueue if still
+ * queued.
+ */
+void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
+{
+	return __finish_wait(q, wait, NULL);
+}
 EXPORT_SYMBOL(finish_wait);
 
 /**
@@ -313,24 +345,39 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *arg)
 EXPORT_SYMBOL(wake_bit_function);
 
 /*
- * To allow interruptible waiting and asynchronous (i.e. nonblocking)
- * waiting, the actions of __wait_on_bit() and __wait_on_bit_lock() are
- * permitted return codes. Nonzero return codes halt waiting and return.
+ * waits on a bit to be cleared (see wait_on_bit in wait.h for details.
+ * A page is optionally provided when used to wait on the PG_locked or
+ * PG_writeback bit. By setting PG_waiters a lookup of the waitqueue
+ * can be avoided during unlock_page or end_page_writeback.
  */
 int __sched
-__wait_on_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
+__wait_on_page_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
+			struct page *page,
 			int (*action)(void *), unsigned mode)
 {
 	int ret = 0;
 
 	do {
-		prepare_to_wait(wq, &q->wait, mode);
+		__prepare_to_wait(wq, &q->wait, page, mode, false);
 		if (test_bit(q->key.bit_nr, q->key.flags))
 			ret = (*action)(q->key.flags);
 	} while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
-	finish_wait(wq, &q->wait);
+	__finish_wait(wq, &q->wait, page);
 	return ret;
 }
+
+/*
+ * To allow interruptible waiting and asynchronous (i.e. nonblocking)
+ * waiting, the actions of __wait_on_bit() and __wait_on_bit_lock() are
+ * permitted return codes. Nonzero return codes halt waiting and return.
+ */
+int __sched
+__wait_on_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
+			int (*action)(void *), unsigned mode)
+{
+	return __wait_on_page_bit(wq, q, NULL, action, mode);
+}
+
 EXPORT_SYMBOL(__wait_on_bit);
 
 int __sched out_of_line_wait_on_bit(void *word, int bit,
@@ -344,13 +391,14 @@ int __sched out_of_line_wait_on_bit(void *word, int bit,
 EXPORT_SYMBOL(out_of_line_wait_on_bit);
 
 int __sched
-__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
+__wait_on_page_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
+			struct page *page,
 			int (*action)(void *), unsigned mode)
 {
 	do {
 		int ret;
 
-		prepare_to_wait_exclusive(wq, &q->wait, mode);
+		__prepare_to_wait(wq, &q->wait, page, mode, true);
 		if (!test_bit(q->key.bit_nr, q->key.flags))
 			continue;
 		ret = action(q->key.flags);
@@ -359,9 +407,16 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 		abort_exclusive_wait(wq, &q->wait, mode, &q->key);
 		return ret;
 	} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
-	finish_wait(wq, &q->wait);
+	__finish_wait(wq, &q->wait, page);
 	return 0;
 }
+
+int __sched
+__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
+			int (*action)(void *), unsigned mode)
+{
+	return __wait_on_page_bit_lock(wq, q, NULL, action, mode);
+}
 EXPORT_SYMBOL(__wait_on_bit_lock);
 
 int __sched out_of_line_wait_on_bit_lock(void *word, int bit,
@@ -380,6 +435,32 @@ void __wake_up_bit(wait_queue_head_t *wq, void *word, int bit)
 	if (waitqueue_active(wq))
 		__wake_up(wq, TASK_NORMAL, 1, &key);
 }
+
+void __wake_up_page_bit(wait_queue_head_t *wqh, struct page *page, void *word, int bit)
+{
+	struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(word, bit);
+	unsigned long flags;
+
+	/* If there is no PG_waiters bit, always take the slow path */
+	if (!__PG_WAITERS && waitqueue_active(wq)) {
+		__wake_up(wq, TASK_NORMAL, 1, &key);
+		return;
+	}
+
+	/*
+	 * Unlike __wake_up_bit it is necessary to check waitqueue_active to be
+	 * checked under the wqh->lock to avoid races with parallel additions
+	 * to the waitqueue. Otherwise races could result in lost wakeups
+	 */
+	spin_lock_irqsave(&wqh->lock, flags);
+	if (waitqueue_active(wqh))
+		__wake_up_common(wqh, TASK_NORMAL, 1, 0, &key);
+	else
+		ClearPageWaiters(page);
+	spin_unlock_irqrestore(&wqh->lock, flags);
+}
+
+
 EXPORT_SYMBOL(__wake_up_bit);
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 263cffe..07633a4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -682,9 +682,9 @@ static wait_queue_head_t *page_waitqueue(struct page *page)
 	return &zone->wait_table[hash_ptr(page, zone->wait_table_bits)];
 }
 
-static inline void wake_up_page(struct page *page, int bit)
+static inline void wake_up_page(struct page *page, int bit_nr)
 {
-	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
+	__wake_up_page_bit(page_waitqueue(page), page, &page->flags, bit_nr);
 }
 
 void wait_on_page_bit(struct page *page, int bit_nr)
@@ -692,8 +692,8 @@ void wait_on_page_bit(struct page *page, int bit_nr)
 	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
 
 	if (test_bit(bit_nr, &page->flags))
-		__wait_on_bit(page_waitqueue(page), &wait, sleep_on_page,
-							TASK_UNINTERRUPTIBLE);
+		__wait_on_page_bit(page_waitqueue(page), &wait, page,
+					sleep_on_page, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(wait_on_page_bit);
 
@@ -704,7 +704,7 @@ int wait_on_page_bit_killable(struct page *page, int bit_nr)
 	if (!test_bit(bit_nr, &page->flags))
 		return 0;
 
-	return __wait_on_bit(page_waitqueue(page), &wait,
+	return __wait_on_page_bit(page_waitqueue(page), &wait, page,
 			     sleep_on_page_killable, TASK_KILLABLE);
 }
 
@@ -743,7 +743,8 @@ void unlock_page(struct page *page)
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	clear_bit_unlock(PG_locked, &page->flags);
 	smp_mb__after_atomic();
-	wake_up_page(page, PG_locked);
+	if (unlikely(PageWaiters(page)))
+		wake_up_page(page, PG_locked);
 }
 EXPORT_SYMBOL(unlock_page);
 
@@ -769,7 +770,8 @@ void end_page_writeback(struct page *page)
 		BUG();
 
 	smp_mb__after_atomic();
-	wake_up_page(page, PG_writeback);
+	if (unlikely(PageWaiters(page)))
+		wake_up_page(page, PG_writeback);
 }
 EXPORT_SYMBOL(end_page_writeback);
 
@@ -806,8 +808,8 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
-	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
-							TASK_UNINTERRUPTIBLE);
+	__wait_on_page_bit_lock(page_waitqueue(page), &wait, page,
+					sleep_on_page, TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(__lock_page);
 
@@ -815,9 +817,10 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
-	return __wait_on_bit_lock(page_waitqueue(page), &wait,
-					sleep_on_page_killable, TASK_KILLABLE);
+	return __wait_on_page_bit_lock(page_waitqueue(page), &wait, page,
+					sleep_on_page, TASK_KILLABLE);
 }
+
 EXPORT_SYMBOL_GPL(__lock_page_killable);
 
 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd1f005..ebb947d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6603,6 +6603,7 @@ static const struct trace_print_flags pageflag_names[] = {
 	{1UL << PG_private_2,		"private_2"	},
 	{1UL << PG_writeback,		"writeback"	},
 #ifdef CONFIG_PAGEFLAGS_EXTENDED
+	{1UL << PG_waiters,		"waiters"	},
 	{1UL << PG_head,		"head"		},
 	{1UL << PG_tail,		"tail"		},
 #else
diff --git a/mm/swap.c b/mm/swap.c
index 9e8e347..1581dbf 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -67,6 +67,10 @@ static void __page_cache_release(struct page *page)
 static void __put_single_page(struct page *page)
 {
 	__page_cache_release(page);
+
+	/* See release_pages on why this clear may be necessary */
+	__ClearPageWaiters(page);
+
 	free_hot_cold_page(page, false);
 }
 
@@ -916,6 +920,14 @@ void release_pages(struct page **pages, int nr, bool cold)
 		/* Clear Active bit in case of parallel mark_page_accessed */
 		__ClearPageActive(page);
 
+		/*
+		 * pages are hashed on a waitqueue so there may be collisions.
+		 * When waiters are woken the waitqueue is checked but
+		 * unrelated pages on the queue can leave the bit set. Clear
+		 * it here if that happens.
+		 */
+		__ClearPageWaiters(page);
+
 		list_add(&page->lru, &pages_to_free);
 	}
 	if (zone)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7f85041..d7a4969 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1096,6 +1096,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * waiting on the page lock, because there are no references.
 		 */
 		__clear_page_locked(page);
+
+		/* See release_pages on why this clear may be necessary */
+		__ClearPageWaiters(page);
 free_it:
 		nr_reclaimed++;
 
@@ -1427,6 +1430,8 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
+			/* See release_pages on why this clear may be necessary */
+			__ClearPageWaiters(page);
 			del_page_from_lru_list(page, lruvec, lru);
 
 			if (unlikely(PageCompound(page))) {
@@ -1650,6 +1655,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
 		if (put_page_testzero(page)) {
 			__ClearPageLRU(page);
 			__ClearPageActive(page);
+			/* See release_pages on why this clear may be necessary */
+			__ClearPageWaiters(page);
 			del_page_from_lru_list(page, lruvec, lru);
 
 			if (unlikely(PageCompound(page))) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-05-22 10:40 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-13  9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13  9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13  9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13  9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13  9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58   ` Peter Zijlstra
2014-05-13 12:28     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25   ` Andrew Morton
2014-05-14  6:32     ` Mel Gorman
2014-05-14 20:29     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13  9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13  9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13  9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22  9:24   ` Vlastimil Babka
2014-05-22 18:23     ` Andrew Morton
2014-05-22 18:45       ` Vlastimil Babka
2014-05-13  9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27   ` Vlastimil Babka
2014-05-13 14:09     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36   ` Vlastimil Babka
2014-05-13 14:23     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13  9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13  9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13  9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13  9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13  9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09   ` Peter Zijlstra
2014-05-13 12:50     ` Mel Gorman
2014-05-13 13:49       ` Jan Kara
2014-05-13 14:30         ` Mel Gorman
2014-05-13 14:01       ` Peter Zijlstra
2014-05-13 14:46         ` Mel Gorman
2014-05-13 13:50   ` Jan Kara
2014-05-13 22:29   ` Andrew Morton
2014-05-14  6:12     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29   ` Theodore Ts'o
2014-05-20 15:49   ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34     ` Andrew Morton
2014-05-21 12:09       ` Mel Gorman
2014-05-21 22:11         ` Andrew Morton
2014-05-22  0:07           ` Mel Gorman
2014-05-22  5:35       ` Prabhakar Lad
2014-05-13  9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53   ` Mel Gorman
2014-05-13 14:17     ` Peter Zijlstra
2014-05-13 15:27       ` Paul E. McKenney
2014-05-13 15:44         ` Peter Zijlstra
2014-05-13 16:14           ` Paul E. McKenney
2014-05-13 18:57             ` Oleg Nesterov
2014-05-13 20:24               ` Paul E. McKenney
2014-05-14 14:25                 ` Oleg Nesterov
2014-05-13 18:22           ` Oleg Nesterov
2014-05-13 18:18         ` Oleg Nesterov
2014-05-13 18:24           ` Peter Zijlstra
2014-05-13 18:52           ` Paul E. McKenney
2014-05-13 19:31             ` Oleg Nesterov
2014-05-13 20:32               ` Paul E. McKenney
2014-05-14 16:11       ` Oleg Nesterov
2014-05-14 16:17         ` Peter Zijlstra
2014-05-16 13:51           ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51             ` [PATCH 1/1] " Oleg Nesterov
2014-05-21  9:29               ` Peter Zijlstra
2014-05-21 19:19                 ` Andrew Morton
2014-05-21 19:18             ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29         ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53           ` Mel Gorman
2014-05-15 10:48           ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20             ` Peter Zijlstra
2014-05-15 13:29               ` Peter Zijlstra
2014-05-15 15:34               ` Oleg Nesterov
2014-05-15 15:45                 ` Peter Zijlstra
2014-05-15 16:18               ` Mel Gorman
2014-05-15 15:03             ` Oleg Nesterov
2014-05-15 21:24             ` Andrew Morton
2014-05-21 12:15               ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02                 ` Peter Zijlstra
2014-05-21 15:33                   ` Mel Gorman
2014-05-21 16:08                     ` Peter Zijlstra
2014-05-21 21:26                 ` Andrew Morton
2014-05-21 21:33                   ` Peter Zijlstra
2014-05-21 21:50                     ` Andrew Morton
2014-05-22  0:07                       ` Mel Gorman
2014-05-22  7:20                         ` Peter Zijlstra
2014-05-22 10:40                           ` Mel Gorman [this message]
2014-05-22 10:56                             ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Peter Zijlstra
2014-05-22 13:00                               ` Mel Gorman
2014-05-22 14:40                               ` Mel Gorman
2014-05-22 15:04                                 ` Peter Zijlstra
2014-05-22 15:36                                   ` Mel Gorman
2014-05-22 16:58                                   ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22  6:45                       ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22  8:46                         ` Mel Gorman
2014-05-22 17:47                           ` Andrew Morton
2014-05-22 19:53                             ` Mel Gorman
2014-05-21 23:35                   ` Mel Gorman
2014-05-13 16:52   ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14  7:31     ` Mel Gorman
2014-05-19  8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522104051.GE23991@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).