From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
Christoph Lameter <cl@linux.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
Davidlohr Bueso <dave@stgolabs.net>,
Oleg Nesterov <oleg@redhat.com>,
Sasha Levin <sasha.levin@oracle.com>,
Dmitry Vyukov <dvyukov@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Charles (Chas) Williams" <ciwillia@brocade.com>
Subject: [PATCH 3.14 02/29] mm: migrate dirty page without clear_page_dirty_for_io etc
Date: Sun, 14 Aug 2016 22:07:30 +0200 [thread overview]
Message-ID: <20160814200731.521604418@linuxfoundation.org> (raw)
In-Reply-To: <20160814200731.375346059@linuxfoundation.org>
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins <hughd@google.com>
commit 42cb14b110a5698ccf26ce59c4441722605a3743 upstream.
clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.
No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.
It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).
Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same. Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).
But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ciwillia@brocade.com: backported to 3.14: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/migrate.c | 51 +++++++++++++++++++++++++++++++--------------------
1 file changed, 31 insertions(+), 20 deletions(-)
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -30,6 +30,7 @@
#include <linux/mempolicy.h>
#include <linux/vmalloc.h>
#include <linux/security.h>
+#include <linux/backing-dev.h>
#include <linux/memcontrol.h>
#include <linux/syscalls.h>
#include <linux/hugetlb.h>
@@ -344,6 +345,8 @@ int migrate_page_move_mapping(struct add
struct buffer_head *head, enum migrate_mode mode,
int extra_count)
{
+ struct zone *oldzone, *newzone;
+ int dirty;
int expected_count = 1 + extra_count;
void **pslot;
@@ -354,6 +357,9 @@ int migrate_page_move_mapping(struct add
return MIGRATEPAGE_SUCCESS;
}
+ oldzone = page_zone(page);
+ newzone = page_zone(newpage);
+
spin_lock_irq(&mapping->tree_lock);
pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -394,6 +400,13 @@ int migrate_page_move_mapping(struct add
set_page_private(newpage, page_private(page));
}
+ /* Move dirty while page refs frozen and newpage not yet exposed */
+ dirty = PageDirty(page);
+ if (dirty) {
+ ClearPageDirty(page);
+ SetPageDirty(newpage);
+ }
+
radix_tree_replace_slot(pslot, newpage);
/*
@@ -403,6 +416,9 @@ int migrate_page_move_mapping(struct add
*/
page_unfreeze_refs(page, expected_count - 1);
+ spin_unlock(&mapping->tree_lock);
+ /* Leave irq disabled to prevent preemption while updating stats */
+
/*
* If moved to a different zone then also account
* the page for that zone. Other VM counters will be
@@ -413,13 +429,19 @@ int migrate_page_move_mapping(struct add
* via NR_FILE_PAGES and NR_ANON_PAGES if they
* are mapped to swap space.
*/
- __dec_zone_page_state(page, NR_FILE_PAGES);
- __inc_zone_page_state(newpage, NR_FILE_PAGES);
- if (!PageSwapCache(page) && PageSwapBacked(page)) {
- __dec_zone_page_state(page, NR_SHMEM);
- __inc_zone_page_state(newpage, NR_SHMEM);
+ if (newzone != oldzone) {
+ __dec_zone_state(oldzone, NR_FILE_PAGES);
+ __inc_zone_state(newzone, NR_FILE_PAGES);
+ if (PageSwapBacked(page) && !PageSwapCache(page)) {
+ __dec_zone_state(oldzone, NR_SHMEM);
+ __inc_zone_state(newzone, NR_SHMEM);
+ }
+ if (dirty && mapping_cap_account_dirty(mapping)) {
+ __dec_zone_state(oldzone, NR_FILE_DIRTY);
+ __inc_zone_state(newzone, NR_FILE_DIRTY);
+ }
}
- spin_unlock_irq(&mapping->tree_lock);
+ local_irq_enable();
return MIGRATEPAGE_SUCCESS;
}
@@ -544,20 +566,9 @@ void migrate_page_copy(struct page *newp
if (PageMappedToDisk(page))
SetPageMappedToDisk(newpage);
- if (PageDirty(page)) {
- clear_page_dirty_for_io(page);
- /*
- * Want to mark the page and the radix tree as dirty, and
- * redo the accounting that clear_page_dirty_for_io undid,
- * but we can't use set_page_dirty because that function
- * is actually a signal that all of the page has become dirty.
- * Whereas only part of our page may be dirty.
- */
- if (PageSwapBacked(page))
- SetPageDirty(newpage);
- else
- __set_page_dirty_nobuffers(newpage);
- }
+ /* Move dirty on pages not done by migrate_page_move_mapping() */
+ if (PageDirty(page))
+ SetPageDirty(newpage);
/*
* Copy NUMA information to the new page, to prevent over-eager
next prev parent reply other threads:[~2016-08-14 20:08 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20160814200812uscas1p1ef0170d47bedbb472ff4f71fa6e71b1c@uscas1p1.samsung.com>
2016-08-14 20:07 ` [PATCH 3.14 00/29] 3.14.76-stable review Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 01/29] USB: fix invalid memory access in hub_activate() Greg Kroah-Hartman
2016-08-14 20:07 ` Greg Kroah-Hartman [this message]
2016-08-14 20:07 ` [PATCH 3.14 03/29] printk: do cond_resched() between lines while outputting to consoles Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 04/29] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 05/29] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 06/29] x86/mm: Improve switch_mm() barrier comments Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 08/29] USB: fix up incorrect quirk Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 09/29] arm: oabi compat: add missing access checks Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 10/29] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 11/29] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 12/29] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 13/29] scsi: remove scsi_end_request Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 14/29] scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 15/29] IB/security: Restrict use of the write() interface Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 16/29] block: fix use-after-free in seq file Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 17/29] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
2016-08-21 11:49 ` Willy Tarreau
2016-08-29 9:23 ` Manfred Spraul
2016-08-29 11:49 ` Willy Tarreau
2016-08-14 20:07 ` [PATCH 3.14 18/29] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 19/29] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 20/29] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 21/29] ext4: check for extents that wrap around Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 22/29] ext4: fix deadlock during page writeback Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 23/29] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 24/29] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 25/29] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 26/29] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 27/29] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 28/29] [PATCH 1/8] tcp: make challenge acks less predictable Greg Kroah-Hartman
2016-08-14 20:07 ` [PATCH 3.14 29/29] ext4: fix reference counting bug on block allocation error Greg Kroah-Hartman
2016-08-15 14:49 ` [PATCH 3.14 00/29] 3.14.76-stable review Guenter Roeck
2016-08-16 4:01 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160814200731.521604418@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=ciwillia@brocade.com \
--cc=cl@linux.com \
--cc=dave@stgolabs.net \
--cc=dvyukov@google.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).