[PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Chinner <dgc@sgi.com>
To: linux-kernel@vger.kernel.org
Cc: xfs@oss.sgi.com, akpm@osdl.org
Subject: [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS
Date: Wed, 24 Jan 2007 09:37:02 +1100	[thread overview]
Message-ID: <20070123223702.GF33919298@melbourne.sgi.com> (raw)


With the recent changes to cancel_dirty_pages(), XFS will
dump warnings in the syslog because it can truncate_inode_pages()
on dirty mapped pages.

I've determined that this is indeed correct behaviour for XFS
as this can happen in the case of races on mmap()d files with
direct I/O. In this case when we do a direct I/O read, we
flush the dirty pages to disk, then truncate them out of the
page cache. Unfortunately, between the flush and the truncate
the mmap could dirty the page again. At this point we toss a
dirty page that is mapped.

None of the existing functions for truncating pages or invalidating
pages work in this situation. Invalidating a page only works for
non-dirty pages with non-dirty buffers, and they only work for
whole pages and XFS requires partial page truncation.

On top of that the page invalidation functions don't actually
call into the filesystem to invalidate the page and so the filesystem
can't actually invalidate the page properly (e.g. do stuff based on
private buffer head flags).

So that leaves us needing to use truncate semantics an the problem
is that none of them unmap pages in a non-racy manner - if they
unmap pages they do it separately tothe truncate of the page,
leading to races with mmap redirtying the page between the unmap and
the truncate ofthe page.

Hence we need a truncate function that unmaps the pages while they
are locked for truncate in a similar fashion to
invalidate_inode_pages2_range(). The following patch (unchanged from
the last time it was sent) does this. The XFS changes are in a
second patch.

The patch has been test on ia64 and x86-64 via XFSQA and a lot
of fsx mixing mmap and direct I/O operations.

Signed-off-by: Dave Chinner <dgc@sgi.com>

---
 
Index: 2.6.x-xfs-new/include/linux/mm.h
===================================================================
--- 2.6.x-xfs-new.orig/include/linux/mm.h	2007-01-15 15:09:57.000000000 +1100
+++ 2.6.x-xfs-new/include/linux/mm.h	2007-01-16 08:59:24.031897743 +1100
@@ -1060,6 +1060,8 @@ extern unsigned long page_unuse(struct p
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
 				       loff_t lstart, loff_t lend);
+extern void truncate_unmap_inode_pages_range(struct address_space *,
+				       loff_t lstart, loff_t lend, int unmap);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int *);
Index: 2.6.x-xfs-new/mm/truncate.c
===================================================================
--- 2.6.x-xfs-new.orig/mm/truncate.c	2007-01-16 08:59:23.947908876 +1100
+++ 2.6.x-xfs-new/mm/truncate.c	2007-01-16 09:35:53.102924243 +1100
@@ -59,7 +59,7 @@ void cancel_dirty_page(struct page *page
 
 		WARN_ON(++warncount < 5);
 	}
-		
+
 	if (TestClearPageDirty(page)) {
 		struct address_space *mapping = page->mapping;
 		if (mapping && mapping_cap_account_dirty(mapping)) {
@@ -122,16 +122,34 @@ invalidate_complete_page(struct address_
 	return ret;
 }
 
+/*
+ * This is a helper for truncate_unmap_inode_page. Unmap the page we
+ * are passed. Page must be locked by the caller.
+ */
+static void
+unmap_single_page(struct address_space *mapping, struct page *page)
+{
+	BUG_ON(!PageLocked(page));
+	while (page_mapped(page)) {
+		unmap_mapping_range(mapping,
+			(loff_t)page->index << PAGE_CACHE_SHIFT,
+			PAGE_CACHE_SIZE, 0);
+	}
+}
+
 /**
- * truncate_inode_pages - truncate range of pages specified by start and
+ * truncate_unmap_inode_pages_range - truncate range of pages specified by
+ * start and end byte offsets and optionally unmap them first.
  * end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
  * @lend: offset to which to truncate
+ * @unmap: unmap whole truncated pages if non-zero
  *
  * Truncate the page cache, removing the pages that are between
  * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * (if lstart is not page aligned)). If specified, unmap the pages
+ * before they are removed.
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -146,8 +164,8 @@ invalidate_complete_page(struct address_
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
  */
-void truncate_inode_pages_range(struct address_space *mapping,
-				loff_t lstart, loff_t lend)
+void truncate_unmap_inode_pages_range(struct address_space *mapping,
+				loff_t lstart, loff_t lend, int unmap)
 {
 	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	pgoff_t end;
@@ -162,6 +180,14 @@ void truncate_inode_pages_range(struct a
 	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
 	end = (lend >> PAGE_CACHE_SHIFT);
 
+	/*
+	 * if unmapping, do a range unmap up front to minimise the
+	 * overhead of unmapping the pages
+	 */
+	if (unmap) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_CACHE_SHIFT,
+					   (loff_t)end << PAGE_CACHE_SHIFT, 0);
+	}
 	pagevec_init(&pvec, 0);
 	next = start;
 	while (next <= end &&
@@ -184,6 +210,8 @@ void truncate_inode_pages_range(struct a
 				unlock_page(page);
 				continue;
 			}
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_complete_page(mapping, page);
 			unlock_page(page);
 		}
@@ -195,6 +223,8 @@ void truncate_inode_pages_range(struct a
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
 			wait_on_page_writeback(page);
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_partial_page(page, partial);
 			unlock_page(page);
 			page_cache_release(page);
@@ -224,12 +254,30 @@ void truncate_inode_pages_range(struct a
 			if (page->index > next)
 				next = page->index;
 			next++;
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_complete_page(mapping, page);
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
 	}
 }
+EXPORT_SYMBOL(truncate_unmap_inode_pages_range);
+
+/**
+ * truncate_inode_pages_range - truncate range of pages specified by start and
+ * end byte offsets
+ * @mapping: mapping to truncate
+ * @lstart: offset from which to truncate
+ * @lend: offset to which to truncate
+ *
+ * Called under (and serialised by) inode->i_mutex.
+ */
+void truncate_inode_pages_range(struct address_space *mapping,
+				loff_t lstart, loff_t lend)
+{
+	truncate_unmap_inode_pages_range(mapping, lstart, lend, 0);
+}
 EXPORT_SYMBOL(truncate_inode_pages_range);
 
 /**
@@ -241,7 +289,7 @@ EXPORT_SYMBOL(truncate_inode_pages_range
  */
 void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
 {
-	truncate_inode_pages_range(mapping, lstart, (loff_t)-1);
+	truncate_unmap_inode_pages_range(mapping, lstart, (loff_t)-1, 0);
 }
 EXPORT_SYMBOL(truncate_inode_pages);

next             reply	other threads:[~2007-01-23 22:37 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-23 22:37 David Chinner [this message]
2007-01-24 12:13 ` [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS Peter Zijlstra
2007-01-24 13:43   ` Nick Piggin
2007-01-24 14:40     ` Peter Zijlstra
2007-01-25  0:05       ` Nick Piggin
2007-01-24 22:46     ` David Chinner
2007-01-25  0:12       ` Nick Piggin
2007-01-25  0:35         ` David Chinner
2007-01-25  0:47           ` Nick Piggin
2007-01-25  1:52             ` David Chinner
2007-01-25  2:01               ` Nick Piggin
2007-01-25  3:42                 ` David Chinner
2007-01-25  4:25                   ` Nick Piggin
2007-01-25  7:40                     ` David Chinner
2007-01-25 10:26                       ` Nick Piggin
2007-01-24 22:24   ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070123223702.GF33919298@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox