stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Christoph Lameter <cl@linux.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Charles (Chas) Williams" <ciwillia@brocade.com>
Subject: [PATCH 3.14 02/29] mm: migrate dirty page without clear_page_dirty_for_io etc
Date: Sun, 14 Aug 2016 22:07:30 +0200	[thread overview]
Message-ID: <20160814200731.521604418@linuxfoundation.org> (raw)
In-Reply-To: <20160814200731.375346059@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 42cb14b110a5698ccf26ce59c4441722605a3743 upstream.

clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.

No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.

It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).

Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same.  Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).

But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ciwillia@brocade.com: backported to 3.14: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/migrate.c |   51 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -30,6 +30,7 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/backing-dev.h>
 #include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
@@ -344,6 +345,8 @@ int migrate_page_move_mapping(struct add
 		struct buffer_head *head, enum migrate_mode mode,
 		int extra_count)
 {
+	struct zone *oldzone, *newzone;
+	int dirty;
 	int expected_count = 1 + extra_count;
 	void **pslot;
 
@@ -354,6 +357,9 @@ int migrate_page_move_mapping(struct add
 		return MIGRATEPAGE_SUCCESS;
 	}
 
+	oldzone = page_zone(page);
+	newzone = page_zone(newpage);
+
 	spin_lock_irq(&mapping->tree_lock);
 
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -394,6 +400,13 @@ int migrate_page_move_mapping(struct add
 		set_page_private(newpage, page_private(page));
 	}
 
+	/* Move dirty while page refs frozen and newpage not yet exposed */
+	dirty = PageDirty(page);
+	if (dirty) {
+		ClearPageDirty(page);
+		SetPageDirty(newpage);
+	}
+
 	radix_tree_replace_slot(pslot, newpage);
 
 	/*
@@ -403,6 +416,9 @@ int migrate_page_move_mapping(struct add
 	 */
 	page_unfreeze_refs(page, expected_count - 1);
 
+	spin_unlock(&mapping->tree_lock);
+	/* Leave irq disabled to prevent preemption while updating stats */
+
 	/*
 	 * If moved to a different zone then also account
 	 * the page for that zone. Other VM counters will be
@@ -413,13 +429,19 @@ int migrate_page_move_mapping(struct add
 	 * via NR_FILE_PAGES and NR_ANON_PAGES if they
 	 * are mapped to swap space.
 	 */
-	__dec_zone_page_state(page, NR_FILE_PAGES);
-	__inc_zone_page_state(newpage, NR_FILE_PAGES);
-	if (!PageSwapCache(page) && PageSwapBacked(page)) {
-		__dec_zone_page_state(page, NR_SHMEM);
-		__inc_zone_page_state(newpage, NR_SHMEM);
+	if (newzone != oldzone) {
+		__dec_zone_state(oldzone, NR_FILE_PAGES);
+		__inc_zone_state(newzone, NR_FILE_PAGES);
+		if (PageSwapBacked(page) && !PageSwapCache(page)) {
+			__dec_zone_state(oldzone, NR_SHMEM);
+			__inc_zone_state(newzone, NR_SHMEM);
+		}
+		if (dirty && mapping_cap_account_dirty(mapping)) {
+			__dec_zone_state(oldzone, NR_FILE_DIRTY);
+			__inc_zone_state(newzone, NR_FILE_DIRTY);
+		}
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	local_irq_enable();
 
 	return MIGRATEPAGE_SUCCESS;
 }
@@ -544,20 +566,9 @@ void migrate_page_copy(struct page *newp
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
 
-	if (PageDirty(page)) {
-		clear_page_dirty_for_io(page);
-		/*
-		 * Want to mark the page and the radix tree as dirty, and
-		 * redo the accounting that clear_page_dirty_for_io undid,
-		 * but we can't use set_page_dirty because that function
-		 * is actually a signal that all of the page has become dirty.
-		 * Whereas only part of our page may be dirty.
-		 */
-		if (PageSwapBacked(page))
-			SetPageDirty(newpage);
-		else
-			__set_page_dirty_nobuffers(newpage);
- 	}
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (PageDirty(page))
+		SetPageDirty(newpage);
 
 	/*
 	 * Copy NUMA information to the new page, to prevent over-eager



  parent reply	other threads:[~2016-08-14 20:08 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20160814200812uscas1p1ef0170d47bedbb472ff4f71fa6e71b1c@uscas1p1.samsung.com>
2016-08-14 20:07 ` [PATCH 3.14 00/29] 3.14.76-stable review Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 01/29] USB: fix invalid memory access in hub_activate() Greg Kroah-Hartman
2016-08-14 20:07   ` Greg Kroah-Hartman [this message]
2016-08-14 20:07   ` [PATCH 3.14 03/29] printk: do cond_resched() between lines while outputting to consoles Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 04/29] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 05/29] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 06/29] x86/mm: Improve switch_mm() barrier comments Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 08/29] USB: fix up incorrect quirk Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 09/29] arm: oabi compat: add missing access checks Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 10/29] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 11/29] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 12/29] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 13/29] scsi: remove scsi_end_request Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 14/29] scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 15/29] IB/security: Restrict use of the write() interface Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 16/29] block: fix use-after-free in seq file Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 17/29] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
2016-08-21 11:49     ` Willy Tarreau
2016-08-29  9:23       ` Manfred Spraul
2016-08-29 11:49         ` Willy Tarreau
2016-08-14 20:07   ` [PATCH 3.14 18/29] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 19/29] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 20/29] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 21/29] ext4: check for extents that wrap around Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 22/29] ext4: fix deadlock during page writeback Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 23/29] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 24/29] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 25/29] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 26/29] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 27/29] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 28/29] [PATCH 1/8] tcp: make challenge acks less predictable Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 29/29] ext4: fix reference counting bug on block allocation error Greg Kroah-Hartman
2016-08-15 14:49   ` [PATCH 3.14 00/29] 3.14.76-stable review Guenter Roeck
2016-08-16  4:01   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160814200731.521604418@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=ciwillia@brocade.com \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).