linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390
@ 2012-10-01 16:26 Jan Kara
  2012-10-08 14:28 ` Mel Gorman
  2012-10-09  4:24 ` Hugh Dickins
  0 siblings, 2 replies; 27+ messages in thread
From: Jan Kara @ 2012-10-01 16:26 UTC (permalink / raw)
  To: linux-mm; +Cc: LKML, xfs, Jan Kara, Martin Schwidefsky, Mel Gorman, linux-s390

On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via standard write, HW
dirty bit gets set and when we later map and unmap the page, page_remove_rmap()
finds the dirty bit and calls set_page_dirty().

Dirtying of a page which shouldn't be dirty can cause all sorts of problems to
filesystems. The bug we observed in practice is that buffers from the page get
freed, so when the page gets later marked as dirty and writeback writes it, XFS
crashes due to an assertion BUG_ON(!PagePrivate(page)) in page_buffers() called
from xfs_count_page_state().

Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then page
unmapped.

Fix the issue by ignoring s390 HW dirty bit for page cache pages in
page_mkclean() and page_remove_rmap(). This is safe because when a page gets
marked as writeable in PTE it is also marked dirty in do_wp_page() or
do_page_fault(). When the dirty bit is cleared by clear_page_dirty_for_io(),
the page gets writeprotected in page_mkclean(). So pagecache page is writeable
if and only if it is dirty.

CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Mel Gorman <mgorman@suse.de>
CC: linux-s390@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/rmap.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 0f3b7cd..6ce8ddb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -973,7 +973,15 @@ int page_mkclean(struct page *page)
 		struct address_space *mapping = page_mapping(page);
 		if (mapping) {
 			ret = page_mkclean_file(mapping, page);
-			if (page_test_and_clear_dirty(page_to_pfn(page), 1))
+			/*
+			 * We ignore dirty bit for pagecache pages. It is safe
+			 * as page is marked dirty iff it is writeable (page is
+			 * marked as dirty when it is made writeable and
+			 * clear_page_dirty_for_io() writeprotects the page
+			 * again).
+			 */
+			if (PageSwapCache(page) &&
+			    page_test_and_clear_dirty(page_to_pfn(page), 1))
 				ret = 1;
 		}
 	}
@@ -1183,8 +1191,12 @@ void page_remove_rmap(struct page *page)
 	 * this if the page is anon, so about to be freed; but perhaps
 	 * not if it's in swapcache - there might be another pte slot
 	 * containing the swap entry, but page not yet written to swap.
+	 * For pagecache pages, we don't care about dirty bit in storage
+	 * key because the page is writeable iff it is dirty (page is marked
+	 * as dirty when it is made writeable and clear_page_dirty_for_io()
+	 * writeprotects the page again).
 	 */
-	if ((!anon || PageSwapCache(page)) &&
+	if (PageSwapCache(page) &&
 	    page_test_and_clear_dirty(page_to_pfn(page), 1))
 		set_page_dirty(page);
 	/*
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread
* [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390
@ 2012-10-22 15:06 Jan Kara
  2012-10-22 19:38 ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2012-10-22 15:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Jan Kara, Martin Schwidefsky, Mel Gorman, linux-s390,
	Hugh Dickins

On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via buffered write, HW
dirty bit gets set and when we later map and unmap the page, page_remove_rmap()
finds the dirty bit and calls set_page_dirty().

Dirtying of a page which shouldn't be dirty can cause all sorts of problems to
filesystems. The bug we observed in practice is that buffers from the page get
freed, so when the page gets later marked as dirty and writeback writes it, XFS
crashes due to an assertion BUG_ON(!PagePrivate(page)) in page_buffers() called
from xfs_count_page_state().

Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then page
unmapped.

Fix the issue by ignoring s390 HW dirty bit for page cache pages of mappings
with mapping_cap_account_dirty(). This is safe because for such mappings when a
page gets marked as writeable in PTE it is also marked dirty in do_wp_page() or
do_page_fault(). When the dirty bit is cleared by clear_page_dirty_for_io(),
the page gets writeprotected in page_mkclean(). So pagecache page is writeable
if and only if it is dirty.

Thanks to Hugh Dickins <hughd@google.com> for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned up
variant of the patch.

The patch has survived about two hours of running fsx-linux on tmpfs while
heavily swapping and several days of running on out build machines where the
original problem was triggered.

CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Mel Gorman <mgorman@suse.de>
CC: linux-s390@vger.kernel.org
CC: Hugh Dickins <hughd@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/rmap.c |   20 +++++++++++++++-----
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7df7984..2ee1ef0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -56,6 +56,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
+#include <linux/backing-dev.h>
 
 #include <asm/tlbflush.h>
 
@@ -926,11 +927,8 @@ int page_mkclean(struct page *page)
 
 	if (page_mapped(page)) {
 		struct address_space *mapping = page_mapping(page);
-		if (mapping) {
+		if (mapping)
 			ret = page_mkclean_file(mapping, page);
-			if (page_test_and_clear_dirty(page_to_pfn(page), 1))
-				ret = 1;
-		}
 	}
 
 	return ret;
@@ -1116,6 +1114,7 @@ void page_add_file_rmap(struct page *page)
  */
 void page_remove_rmap(struct page *page)
 {
+	struct address_space *mapping = page_mapping(page);
 	bool anon = PageAnon(page);
 	bool locked;
 	unsigned long flags;
@@ -1138,8 +1137,19 @@ void page_remove_rmap(struct page *page)
 	 * this if the page is anon, so about to be freed; but perhaps
 	 * not if it's in swapcache - there might be another pte slot
 	 * containing the swap entry, but page not yet written to swap.
+	 *
+	 * And we can skip it on file pages, so long as the filesystem
+	 * participates in dirty tracking; but need to catch shm and tmpfs
+	 * and ramfs pages which have been modified since creation by read
+	 * fault.
+	 *
+	 * Note that mapping must be decided above, before decrementing
+	 * mapcount (which luckily provides a barrier): once page is unmapped,
+	 * it could be truncated and page->mapping reset to NULL at any moment.
+	 * Note also that we are relying on page_mapping(page) to set mapping
+	 * to &swapper_space when PageSwapCache(page).
 	 */
-	if ((!anon || PageSwapCache(page)) &&
+	if (mapping && !mapping_cap_account_dirty(mapping) &&
 	    page_test_and_clear_dirty(page_to_pfn(page), 1))
 		set_page_dirty(page);
 	/*
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-12-18  7:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-01 16:26 [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390 Jan Kara
2012-10-08 14:28 ` Mel Gorman
2012-10-09  4:24 ` Hugh Dickins
2012-10-09  8:18   ` Martin Schwidefsky
2012-10-09 23:21     ` Hugh Dickins
2012-10-10 21:57       ` Hugh Dickins
2012-10-19 14:38       ` Martin Schwidefsky
2012-10-09  9:32   ` Mel Gorman
2012-10-09 23:00     ` Hugh Dickins
2012-10-09 16:21   ` Jan Kara
2012-10-10  2:19     ` Hugh Dickins
2012-10-10  8:55       ` Jan Kara
2012-10-10 21:28         ` Hugh Dickins
2012-10-11  7:42           ` Martin Schwidefsky
2012-10-10 21:56       ` Dave Chinner
2012-10-11  7:44         ` Martin Schwidefsky
2012-10-17  0:43       ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2012-10-22 15:06 Jan Kara
2012-10-22 19:38 ` Andrew Morton
2012-10-23  4:40   ` Hugh Dickins
2012-10-23 10:21   ` Jan Kara
2012-10-23 21:56     ` Andrew Morton
2012-10-24  8:30       ` Martin Schwidefsky
2012-10-25 20:01       ` Jan Kara
2012-12-14  8:45         ` Martin Schwidefsky
2012-12-17 23:31           ` Hugh Dickins
2012-12-18  7:30             ` Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).