From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg KH <gregkh@linuxfoundation.org>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Mel Gorman <mgorman@suse.de>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Minchan Kim <minchan.kim@gmail.com>,
Dave Jones <davej@redhat.com>, Jan Kara <jack@suse.cz>,
Andy Isaacson <adi@hexapodia.org>, Nai Xia <nai.xia@gmail.com>,
Johannes Weiner <jweiner@redhat.com>
Subject: [ 23/40] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage
Date: Thu, 26 Jul 2012 14:29:41 -0700 [thread overview]
Message-ID: <20120726211413.199501784@linuxfoundation.org> (raw)
In-Reply-To: <20120726211411.164006056@linuxfoundation.org>
From: Greg KH <gregkh@linuxfoundation.org>
3.0-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <mgorman@suse.de>
commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page
aging information by reducing LRU list churning had the side-effect
of reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates. Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;
if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;
This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/btrfs/disk-io.c | 4 -
fs/hugetlbfs/inode.c | 3 -
fs/nfs/internal.h | 2
fs/nfs/write.c | 4 -
include/linux/fs.h | 9 ++-
include/linux/migrate.h | 2
mm/migrate.c | 129 ++++++++++++++++++++++++++++++++++--------------
7 files changed, 106 insertions(+), 47 deletions(-)
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -801,7 +801,7 @@ static int btree_submit_bio_hook(struct
#ifdef CONFIG_MIGRATION
static int btree_migratepage(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
/*
* we can't safely write a btree page from here,
@@ -816,7 +816,7 @@ static int btree_migratepage(struct addr
if (page_has_private(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
#endif
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -568,7 +568,8 @@ static int hugetlbfs_set_page_dirty(stru
}
static int hugetlbfs_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page,
+ bool sync)
{
int rc;
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -315,7 +315,7 @@ void nfs_commit_release_pages(struct nfs
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
#else
#define nfs_migrate_page NULL
#endif
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1662,7 +1662,7 @@ out_error:
#ifdef CONFIG_MIGRATION
int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
- struct page *page)
+ struct page *page, bool sync)
{
/*
* If PagePrivate is set, then the page is currently associated with
@@ -1677,7 +1677,7 @@ int nfs_migrate_page(struct address_spac
nfs_fscache_release_page(page, GFP_KERNEL);
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
#endif
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -607,9 +607,12 @@ struct address_space_operations {
loff_t offset, unsigned long nr_segs);
int (*get_xip_mem)(struct address_space *, pgoff_t, int,
void **, unsigned long *);
- /* migrate the contents of a page to the specified target */
+ /*
+ * migrate the contents of a page to the specified target. If sync
+ * is false, it must not block.
+ */
int (*migratepage) (struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
@@ -2478,7 +2481,7 @@ extern int generic_check_addressable(uns
#ifdef CONFIG_MIGRATION
extern int buffer_migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
#else
#define buffer_migrate_page NULL
#endif
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,7 +11,7 @@ typedef struct page *new_page_t(struct p
extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
- struct page *, struct page *);
+ struct page *, struct page *, bool);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -220,6 +220,55 @@ out:
pte_unmap_unlock(ptep, ptl);
}
+#ifdef CONFIG_BLOCK
+/* Returns true if all buffers are successfully locked */
+static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+{
+ struct buffer_head *bh = head;
+
+ /* Simple case, sync compaction */
+ if (sync) {
+ do {
+ get_bh(bh);
+ lock_buffer(bh);
+ bh = bh->b_this_page;
+
+ } while (bh != head);
+
+ return true;
+ }
+
+ /* async case, we cannot block on lock_buffer so use trylock_buffer */
+ do {
+ get_bh(bh);
+ if (!trylock_buffer(bh)) {
+ /*
+ * We failed to lock the buffer and cannot stall in
+ * async migration. Release the taken locks
+ */
+ struct buffer_head *failed_bh = bh;
+ put_bh(failed_bh);
+ bh = head;
+ while (bh != failed_bh) {
+ unlock_buffer(bh);
+ put_bh(bh);
+ bh = bh->b_this_page;
+ }
+ return false;
+ }
+
+ bh = bh->b_this_page;
+ } while (bh != head);
+ return true;
+}
+#else
+static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
+ bool sync)
+{
+ return true;
+}
+#endif /* CONFIG_BLOCK */
+
/*
* Replace the page in the mapping.
*
@@ -229,7 +278,8 @@ out:
* 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
*/
static int migrate_page_move_mapping(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page,
+ struct buffer_head *head, bool sync)
{
int expected_count;
void **pslot;
@@ -259,6 +309,19 @@ static int migrate_page_move_mapping(str
}
/*
+ * In the async migration case of moving a page with buffers, lock the
+ * buffers using trylock before the mapping is moved. If the mapping
+ * was moved, we later failed to lock the buffers and could not move
+ * the mapping back due to an elevated page count, we would have to
+ * block waiting on other references to be dropped.
+ */
+ if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+ page_unfreeze_refs(page, expected_count);
+ spin_unlock_irq(&mapping->tree_lock);
+ return -EAGAIN;
+ }
+
+ /*
* Now we know that no one else is looking at the page.
*/
get_page(newpage); /* add cache reference */
@@ -415,13 +478,13 @@ EXPORT_SYMBOL(fail_migrate_page);
* Pages are locked upon entry and exit.
*/
int migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
int rc;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
- rc = migrate_page_move_mapping(mapping, newpage, page);
+ rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
if (rc)
return rc;
@@ -438,28 +501,28 @@ EXPORT_SYMBOL(migrate_page);
* exist.
*/
int buffer_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
struct buffer_head *bh, *head;
int rc;
if (!page_has_buffers(page))
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
head = page_buffers(page);
- rc = migrate_page_move_mapping(mapping, newpage, page);
+ rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
if (rc)
return rc;
- bh = head;
- do {
- get_bh(bh);
- lock_buffer(bh);
- bh = bh->b_this_page;
-
- } while (bh != head);
+ /*
+ * In the async case, migrate_page_move_mapping locked the buffers
+ * with an IRQ-safe spinlock held. In the sync case, the buffers
+ * need to be locked now
+ */
+ if (sync)
+ BUG_ON(!buffer_migrate_lock_buffers(head, sync));
ClearPagePrivate(page);
set_page_private(newpage, page_private(page));
@@ -536,10 +599,13 @@ static int writeout(struct address_space
* Default handling if a filesystem does not provide a migration function.
*/
static int fallback_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page)
+ struct page *newpage, struct page *page, bool sync)
{
- if (PageDirty(page))
+ if (PageDirty(page)) {
+ if (!sync)
+ return -EBUSY;
return writeout(mapping, page);
+ }
/*
* Buffers may be managed in a filesystem specific way.
@@ -549,7 +615,7 @@ static int fallback_migrate_page(struct
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
- return migrate_page(mapping, newpage, page);
+ return migrate_page(mapping, newpage, page, sync);
}
/*
@@ -585,29 +651,18 @@ static int move_to_new_page(struct page
mapping = page_mapping(page);
if (!mapping)
- rc = migrate_page(mapping, newpage, page);
- else {
+ rc = migrate_page(mapping, newpage, page, sync);
+ else if (mapping->a_ops->migratepage)
/*
- * Do not writeback pages if !sync and migratepage is
- * not pointing to migrate_page() which is nonblocking
- * (swapcache/tmpfs uses migratepage = migrate_page).
+ * Most pages have a mapping and most filesystems provide a
+ * migratepage callback. Anonymous pages are part of swap
+ * space which also has its own migratepage callback. This
+ * is the most common path for page migration.
*/
- if (PageDirty(page) && !sync &&
- mapping->a_ops->migratepage != migrate_page)
- rc = -EBUSY;
- else if (mapping->a_ops->migratepage)
- /*
- * Most pages have a mapping and most filesystems
- * should provide a migration function. Anonymous
- * pages are part of swap space which also has its
- * own migration function. This is the most common
- * path for page migration.
- */
- rc = mapping->a_ops->migratepage(mapping,
- newpage, page);
- else
- rc = fallback_migrate_page(mapping, newpage, page);
- }
+ rc = mapping->a_ops->migratepage(mapping,
+ newpage, page, sync);
+ else
+ rc = fallback_migrate_page(mapping, newpage, page, sync);
if (rc) {
newpage->mapping = NULL;
next prev parent reply other threads:[~2012-07-26 21:36 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-26 21:14 [ 00/40] 3.0.39-stable review Greg KH
2012-07-26 21:29 ` [ 01/40] cifs: always update the inode cache with the results from a FIND_* Greg Kroah-Hartman
2012-07-26 21:29 ` [ 02/40] ntp: Fix STA_INS/DEL clearing bug Greg Kroah-Hartman
2012-07-26 21:29 ` [ 03/40] mm: fix lost kswapd wakeup in kswapd_stop() Greg Kroah-Hartman
2012-07-26 21:29 ` [ 04/40] MIPS: Properly align the .data..init_task section Greg Kroah-Hartman
2012-07-26 21:29 ` [ 05/40] UBIFS: fix a bug in empty space fix-up Greg Kroah-Hartman
2012-07-26 21:29 ` [ 06/40] dm raid1: fix crash with mirror recovery and discard Greg Kroah-Hartman
2012-07-26 21:29 ` [ 07/40] mm/vmstat.c: cache align vm_stat Greg Kroah-Hartman
2012-07-26 21:29 ` [ 08/40] mm: memory hotplug: Check if pages are correctly reserved on a per-section basis Greg Kroah-Hartman
2012-07-26 21:29 ` [ 09/40] mm: reduce the amount of work done when updating min_free_kbytes Greg Kroah-Hartman
2012-07-26 21:29 ` [ 10/40] mm: vmscan: fix force-scanning small targets without swap Greg Kroah-Hartman
2012-07-26 21:29 ` [ 11/40] vmscan: clear ZONE_CONGESTED for zone with good watermark Greg Kroah-Hartman
2012-07-26 21:29 ` [ 12/40] vmscan: add shrink_slab tracepoints Greg Kroah-Hartman
2012-07-26 21:29 ` [ 13/40] vmscan: shrinker->nr updates race and go wrong Greg Kroah-Hartman
2012-07-29 20:29 ` Ben Hutchings
2012-07-30 9:06 ` Mel Gorman
2012-07-30 15:41 ` Greg Kroah-Hartman
2012-07-26 21:29 ` [ 14/40] vmscan: reduce wind up shrinker->nr when shrinker cant do work Greg Kroah-Hartman
2012-07-26 21:29 ` [ 15/40] vmscan: limit direct reclaim for higher order allocations Greg Kroah-Hartman
2012-07-26 21:29 ` [ 16/40] vmscan: abort reclaim/compaction if compaction can proceed Greg Kroah-Hartman
2012-07-26 21:29 ` [ 17/40] mm: compaction: trivial clean up in acct_isolated() Greg Kroah-Hartman
2012-07-26 21:29 ` [ 18/40] mm: change isolate mode from #define to bitwise type Greg Kroah-Hartman
2012-07-26 21:29 ` [ 19/40] mm: compaction: make isolate_lru_page() filter-aware Greg Kroah-Hartman
2012-07-26 21:29 ` [ 20/40] mm: zone_reclaim: " Greg Kroah-Hartman
2012-07-26 21:29 ` [ 21/40] mm: migration: clean up unmap_and_move() Greg Kroah-Hartman
2012-07-26 21:29 ` [ 22/40] mm: compaction: allow compaction to isolate dirty pages Greg Kroah-Hartman
2012-07-26 21:29 ` Greg Kroah-Hartman [this message]
2012-07-26 21:29 ` [ 24/40] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Greg Kroah-Hartman
2012-07-26 21:29 ` [ 25/40] mm: compaction: make isolate_lru_page() filter-aware again Greg Kroah-Hartman
2012-07-26 21:29 ` [ 26/40] kswapd: avoid unnecessary rebalance after an unsuccessful balancing Greg Kroah-Hartman
2012-07-26 21:29 ` [ 27/40] kswapd: assign new_order and new_classzone_idx after wakeup in sleeping Greg Kroah-Hartman
2012-07-26 21:29 ` [ 28/40] mm: compaction: introduce sync-light migration for use by compaction Greg Kroah-Hartman
2012-07-26 21:29 ` [ 29/40] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Greg Kroah-Hartman
2012-07-26 21:29 ` [ 30/40] mm: vmscan: do not OOM if aborting reclaim to start compaction Greg Kroah-Hartman
2012-07-26 21:29 ` [ 31/40] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Greg Kroah-Hartman
2012-07-26 21:29 ` [ 32/40] vmscan: promote shared file mapped pages Greg Kroah-Hartman
2012-07-26 21:29 ` [ 33/40] vmscan: activate executable pages after first usage Greg Kroah-Hartman
2012-07-26 21:29 ` [ 34/40] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Greg Kroah-Hartman
2012-07-26 21:29 ` [ 35/40] mm: test PageSwapBacked in lumpy reclaim Greg Kroah-Hartman
2012-07-26 21:29 ` [ 36/40] mm: vmscan: convert global reclaim to per-memcg LRU lists Greg Kroah-Hartman
2012-07-30 0:25 ` Ben Hutchings
2012-07-30 15:29 ` Greg Kroah-Hartman
2012-07-26 21:29 ` [ 37/40] cpusets: avoid looping when storing to mems_allowed if one node remains set Greg Kroah-Hartman
2012-07-26 21:29 ` [ 38/40] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask Greg Kroah-Hartman
2012-07-26 21:29 ` [ 39/40] cpuset: mm: reduce large amounts of memory barrier related damage v3 Greg Kroah-Hartman
2012-07-27 15:08 ` Herton Ronaldo Krzesinski
2012-07-27 15:23 ` Mel Gorman
2012-07-27 19:01 ` Greg Kroah-Hartman
2012-07-28 5:02 ` Herton Ronaldo Krzesinski
2012-07-28 10:26 ` Mel Gorman
2012-07-30 15:39 ` Greg Kroah-Hartman
2012-07-30 15:37 ` Greg Kroah-Hartman
2012-07-30 15:38 ` Greg Kroah-Hartman
2012-07-26 21:29 ` [ 40/40] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120726211413.199501784@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aarcange@redhat.com \
--cc=adi@hexapodia.org \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=davej@redhat.com \
--cc=jack@suse.cz \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=nai.xia@gmail.com \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).