stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alan@lxorguk.ukuu.org.uk, Jan Kara <jack@suse.cz>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 10/54] mm: fix XFS oops due to dirty pages without buffers on s390
Date: Mon, 29 Oct 2012 14:40:11 -0700	[thread overview]
Message-ID: <20121029213803.907280277@linuxfoundation.org> (raw)
In-Reply-To: <20121029213802.697479610@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit ef5d437f71afdf4afdbab99213add99f4b1318fd upstream.

On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit.  Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().

Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems.  The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().

Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.

Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty().  This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault().  When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean().  So pagecache page is writeable if and only if it is
dirty.

Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.

The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/rmap.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -56,6 +56,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
+#include <linux/backing-dev.h>
 
 #include <asm/tlbflush.h>
 
@@ -977,11 +978,8 @@ int page_mkclean(struct page *page)
 
 	if (page_mapped(page)) {
 		struct address_space *mapping = page_mapping(page);
-		if (mapping) {
+		if (mapping)
 			ret = page_mkclean_file(mapping, page);
-			if (page_test_and_clear_dirty(page_to_pfn(page), 1))
-				ret = 1;
-		}
 	}
 
 	return ret;
@@ -1167,6 +1165,7 @@ void page_add_file_rmap(struct page *pag
  */
 void page_remove_rmap(struct page *page)
 {
+	struct address_space *mapping = page_mapping(page);
 	bool anon = PageAnon(page);
 	bool locked;
 	unsigned long flags;
@@ -1189,8 +1188,19 @@ void page_remove_rmap(struct page *page)
 	 * this if the page is anon, so about to be freed; but perhaps
 	 * not if it's in swapcache - there might be another pte slot
 	 * containing the swap entry, but page not yet written to swap.
+	 *
+	 * And we can skip it on file pages, so long as the filesystem
+	 * participates in dirty tracking; but need to catch shm and tmpfs
+	 * and ramfs pages which have been modified since creation by read
+	 * fault.
+	 *
+	 * Note that mapping must be decided above, before decrementing
+	 * mapcount (which luckily provides a barrier): once page is unmapped,
+	 * it could be truncated and page->mapping reset to NULL at any moment.
+	 * Note also that we are relying on page_mapping(page) to set mapping
+	 * to &swapper_space when PageSwapCache(page).
 	 */
-	if ((!anon || PageSwapCache(page)) &&
+	if (mapping && !mapping_cap_account_dirty(mapping) &&
 	    page_test_and_clear_dirty(page_to_pfn(page), 1))
 		set_page_dirty(page);
 	/*



  parent reply	other threads:[~2012-10-29 21:40 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 21:40 [ 00/54] 3.4.17-stable review Greg Kroah-Hartman
2012-10-29 21:40 ` [ 01/54] drm/radeon: add some new SI PCI ids Greg Kroah-Hartman
2012-10-29 21:40 ` [ 02/54] drm/radeon: add error output if VM CS fails on cayman Greg Kroah-Hartman
2012-10-29 21:40 ` [ 03/54] ALSA: hda - add dock support for Thinkpad T430 Greg Kroah-Hartman
2012-10-29 21:40 ` [ 04/54] gen_init_cpio: avoid stack overflow when expanding Greg Kroah-Hartman
2012-10-29 21:40 ` [ 05/54] fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check Greg Kroah-Hartman
2012-10-29 21:40 ` [ 06/54] drivers/rtc/rtc-imxdi.c: add missing spin lock initialization Greg Kroah-Hartman
2012-10-29 21:40 ` [ 07/54] genalloc: stop crashing the system when destroying a pool Greg Kroah-Hartman
2012-10-29 21:40 ` [ 08/54] ARM: 7559/1: smp: switch away from the idmap before updating init_mm.mm_count Greg Kroah-Hartman
2012-10-29 21:40 ` [ 09/54] x86, mm: Trim memory in memblock to be page aligned Greg Kroah-Hartman
2012-10-29 21:40 ` Greg Kroah-Hartman [this message]
2012-10-29 21:40 ` [ 11/54] SUNRPC: Get rid of the xs_error_report socket callback Greg Kroah-Hartman
2012-10-29 21:40 ` [ 12/54] SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT Greg Kroah-Hartman
2012-10-29 21:40 ` [ 13/54] Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..." Greg Kroah-Hartman
2012-10-29 21:40 ` [ 14/54] SUNRPC: Prevent races in xs_abort_connection() Greg Kroah-Hartman
2012-10-29 21:40 ` [ 15/54] xhci: Fix potential NULL ptr deref in command cancellation Greg Kroah-Hartman
2012-10-29 21:40 ` [ 16/54] sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat() Greg Kroah-Hartman
2012-10-29 21:40 ` [ 17/54] Staging: android: binder: Fix memory leak on thread/process exit Greg Kroah-Hartman
2012-10-29 21:40 ` [ 18/54] Staging: android: binder: Allow using highmem for binder buffers Greg Kroah-Hartman
2012-10-29 21:40 ` [ 19/54] Drivers: hv: Cleanup error handling in vmbus_open() Greg Kroah-Hartman
2012-10-29 21:40 ` [ 20/54] ehci: fix Lucid nohandoff pci quirk to be more generic with BIOS versions Greg Kroah-Hartman
2012-10-29 21:40 ` [ 21/54] ehci: Add yet-another Lucid nohandoff pci quirk Greg Kroah-Hartman
2012-10-29 21:40 ` [ 22/54] usb-storage: add unusual_devs entry for Casio EX-N1 digital camera Greg Kroah-Hartman
2012-10-29 21:40 ` [ 23/54] usb hub: send clear_tt_buffer_complete events when canceling TT clear work Greg Kroah-Hartman
2012-10-29 21:40 ` [ 24/54] USB: whiteheat: fix memory leak in error path Greg Kroah-Hartman
2012-10-29 21:40 ` [ 25/54] USB: opticon: fix DMA from stack Greg Kroah-Hartman
2012-10-29 21:40 ` [ 26/54] USB: opticon: fix memory leak in error path Greg Kroah-Hartman
2012-10-29 21:40 ` [ 27/54] USB: serial: Fix memory leak in sierra_release() Greg Kroah-Hartman
2012-10-29 21:40 ` [ 28/54] USB: sierra: fix memory leak in attach error path Greg Kroah-Hartman
2012-10-29 21:40 ` [ 29/54] USB: sierra: fix memory leak in probe " Greg Kroah-Hartman
2012-10-29 21:40 ` [ 30/54] USB: mos7840: fix urb leak at release Greg Kroah-Hartman
2012-10-29 21:40 ` [ 31/54] USB: mos7840: fix port-device leak in error path Greg Kroah-Hartman
2012-10-29 21:40 ` [ 32/54] USB: mos7840: remove NULL-urb submission Greg Kroah-Hartman
2012-10-29 21:40 ` [ 33/54] USB: mos7840: remove invalid disconnect handling Greg Kroah-Hartman
2012-10-29 21:40 ` [ 34/54] vhost: fix mergeable bufs on BE hosts Greg Kroah-Hartman
2012-10-29 21:40 ` [ 35/54] ARM: SAMSUNG: Add naming of s3c64xx-spi devices Greg Kroah-Hartman
2012-11-04  6:17   ` Colin Cross
2012-11-05  7:51     ` Greg Kroah-Hartman
2012-11-05  7:55       ` Colin Cross
2012-10-29 21:40 ` [ 36/54] ARM: at91/tc: fix typo in the DT document Greg Kroah-Hartman
2012-10-29 21:40 ` [ 37/54] ARM: at91/i2c: change id to let i2c-gpio work Greg Kroah-Hartman
2012-10-29 21:40 ` [ 38/54] ARM: at91: at91sam9g10: fix SOC type detection Greg Kroah-Hartman
2012-10-29 21:40 ` [ 39/54] mac80211: check if key has TKIP type before updating IV Greg Kroah-Hartman
2012-10-29 21:40 ` [ 40/54] Bluetooth: SMP: Fix setting unknown auth_req bits Greg Kroah-Hartman
2012-10-29 21:40 ` [ 41/54] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD Greg Kroah-Hartman
2012-10-29 21:40 ` [ 42/54] dmaengine: sirf: fix a typo in dma_prep_interleaved Greg Kroah-Hartman
2012-10-29 21:40 ` [ 43/54] dmaengine: sirf: fix a typo in moving running dma_desc to active queue Greg Kroah-Hartman
2012-10-29 21:40 ` [ 44/54] dmaengine: imx-dma: fix missing unlock on error in imxdma_xfer_desc() Greg Kroah-Hartman
2012-10-29 21:40 ` [ 45/54] bcma: fix unregistration of cores Greg Kroah-Hartman
2012-10-29 21:40 ` [ 46/54] cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible code Greg Kroah-Hartman
2012-10-29 21:40 ` [ 47/54] Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz" Greg Kroah-Hartman
2012-10-29 21:40 ` [ 48/54] x86, mm: Find_early_table_space based on ranges that are actually being mapped Greg Kroah-Hartman
2012-10-29 21:40 ` [ 49/54] x86, mm: Undo incorrect revert in arch/x86/mm/init.c Greg Kroah-Hartman
2012-10-29 21:40 ` [ 50/54] efi: Defer freeing boot services memory until after ACPI init Greg Kroah-Hartman
2012-10-29 21:40 ` [ 51/54] x86: efi: Turn off efi_enabled after setup on mixed fw/kernel Greg Kroah-Hartman
2012-10-29 21:40 ` [ 52/54] staging: comedi: amplc_pc236: fix invalid register access during detach Greg Kroah-Hartman
2012-10-29 21:40 ` [ 53/54] x86, mm: Use memblock memory loop instead of e820_RAM Greg Kroah-Hartman
2012-10-29 21:40 ` [ 54/54] drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13 Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121029213803.907280277@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=schwidefsky@de.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).