linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org,
	Dan Williams <dan.j.williams@intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>
Subject: [PATCH 6/6] dax: Avoid page invalidation races and unnecessary radix tree traversals
Date: Tue, 27 Sep 2016 18:43:35 +0200	[thread overview]
Message-ID: <1474994615-29553-7-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1474994615-29553-1-git-send-email-jack@suse.cz>

Currently each filesystem (possibly through generic_file_direct_write()
or iomap_dax_rw()) takes care of invalidating page tables and evicting
hole pages from the radix tree when write(2) to the file happens. This
invalidation is only necessary when there is some block allocation
resulting from write(2). Furthermore in current place the invalidation
is racy wrt page fault instantiating a hole page just after we have
invalidated it.

So perform the page invalidation inside dax_do_io() where we can do it
only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index c8a639d2214e..2f69ca891aab 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -186,6 +186,18 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 				 */
 				WARN_ON_ONCE(rw == WRITE &&
 					     buffer_unwritten(bh));
+				/*
+				 * Write can allocate block for an area which
+				 * has a hole page mapped into page tables. We
+				 * have to tear down these mappings so that
+				 * data written by write(2) is visible in mmap.
+				 */
+				if (buffer_new(bh) &&
+				    inode->i_mapping->nrpages) {
+					invalidate_inode_pages2_range(
+						  inode->i_mapping, page,
+						  (bh_max - 1) >> PAGE_SHIFT);
+				}
 			} else {
 				unsigned done = bh->b_size -
 						(bh_max - (pos - first));
@@ -1410,6 +1422,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
 		return -EIO;
 
+	/*
+	 * Write can allocate block for an area which has a hole page mapped
+	 * into page tables. We have to tear down these mappings so that data
+	 * written by write(2) is visible in mmap.
+	 */
+	if (iomap->flags & IOMAP_F_NEW && inode->i_mapping->nrpages) {
+		invalidate_inode_pages2_range(inode->i_mapping,
+					      pos >> PAGE_SHIFT,
+					      (end - 1) >> PAGE_SHIFT);
+	}
+
 	while (pos < end) {
 		unsigned offset = pos & (PAGE_SIZE - 1);
 		struct blk_dax_ctl dax = { 0 };
@@ -1469,23 +1492,6 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (iov_iter_rw(iter) == WRITE)
 		flags |= IOMAP_WRITE;
 
-	/*
-	 * Yes, even DAX files can have page cache attached to them:  A zeroed
-	 * page is inserted into the pagecache when we have to serve a write
-	 * fault on a hole.  It should never be dirtied and can simply be
-	 * dropped from the pagecache once we get real data for the page.
-	 *
-	 * XXX: This is racy against mmap, and there's nothing we can do about
-	 * it. We'll eventually need to shift this down even further so that
-	 * we can check if we allocated blocks over a hole first.
-	 */
-	if (mapping->nrpages) {
-		ret = invalidate_inode_pages2_range(mapping,
-				pos >> PAGE_SHIFT,
-				(pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
-		WARN_ON_ONCE(ret);
-	}
-
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
 				iter, iomap_dax_actor);
-- 
2.6.6


  parent reply	other threads:[~2016-09-27 16:43 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27 16:43 [PATCH 0/6] dax: Page invalidation fixes Jan Kara
2016-09-27 16:43 ` [PATCH 1/6] dax: Do not warn about BH_New buffers Jan Kara
2016-09-30  8:53   ` Christoph Hellwig
2016-09-27 16:43 ` [PATCH 2/6] ext2: Return BH_New buffers for zeroed blocks Jan Kara
2016-09-30  8:53   ` Christoph Hellwig
2016-09-27 16:43 ` [PATCH 3/6] ext4: Remove clearing of BH_New bit " Jan Kara
2016-09-30  8:53   ` Christoph Hellwig
2016-09-27 16:43 ` [PATCH 4/6] xfs: Set BH_New for allocated DAX blocks in __xfs_get_blocks() Jan Kara
2016-09-27 17:01   ` Christoph Hellwig
2016-09-27 17:17     ` Jan Kara
2016-09-28  0:22       ` Dave Chinner
2016-10-03 14:44         ` Jan Kara
2016-09-28  2:13       ` Christoph Hellwig
2016-10-03 14:42         ` Jan Kara
2016-10-03 16:39           ` Christoph Hellwig
2016-09-27 16:43 ` [PATCH 5/6] mm: Invalidate DAX radix tree entries only if appropriate Jan Kara
2016-09-28  0:18   ` Dave Chinner
2016-10-03 14:52     ` Jan Kara
2016-09-30  8:57   ` Christoph Hellwig
2016-10-03 12:56     ` Jan Kara
2016-09-27 16:43 ` Jan Kara [this message]
2016-09-28  0:20   ` [PATCH 6/6] dax: Avoid page invalidation races and unnecessary radix tree traversals Dave Chinner
2016-10-03 14:58     ` Jan Kara
2016-10-03 23:40       ` Dave Chinner
2016-09-30  8:55   ` Christoph Hellwig
2016-10-03 15:02     ` Jan Kara
2016-09-30  8:59 ` [PATCH 0/6] dax: Page invalidation fixes Christoph Hellwig
2016-10-03 13:01   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474994615-29553-7-git-send-email-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=dan.j.williams@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).