stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: ross.zwisler@linux.intel.com, akpm@linux-foundation.org,
	dan.j.williams@intel.com, gregkh@linuxfoundation.org,
	jack@suse.cz, torvalds@linux-foundation.org
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "dax: fix PMD data corruption when fault races with write" has been added to the 4.11-stable tree
Date: Thu, 18 May 2017 09:45:07 +0200	[thread overview]
Message-ID: <149509350714465@kroah.com> (raw)


This is a note to let you know that I've just added the patch titled

    dax: fix PMD data corruption when fault races with write

to the 4.11-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     dax-fix-pmd-data-corruption-when-fault-races-with-write.patch
and it can be found in the queue-4.11 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From 876f29460cbd4086b43475890c1bf2488fa11d40 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Date: Fri, 12 May 2017 15:47:00 -0700
Subject: dax: fix PMD data corruption when fault races with write

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit 876f29460cbd4086b43475890c1bf2488fa11d40 upstream.

This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.

Currently DAX PMD read fault can race with write(2) in the following
way:

CPU1 - write(2)                 CPU2 - read fault
                                dax_iomap_pmd_fault()
                                  ->iomap_begin() - sees hole

dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate

                                  grab_mapping_entry()
				  - we add huge zero page to the radix tree
				    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6258 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/dax.c |   28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1353,6 +1353,16 @@ static int dax_iomap_pmd_fault(struct vm
 		goto fallback;
 
 	/*
+	 * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
+	 * PMD or a HZP entry.  If it can't (because a 4k page is already in
+	 * the tree, for instance), it will return -EEXIST and we just fall
+	 * back to 4k entries.
+	 */
+	entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
+	if (IS_ERR(entry))
+		goto fallback;
+
+	/*
 	 * Note that we don't use iomap_apply here.  We aren't doing I/O, only
 	 * setting up a mapping, so really we're using iomap_begin() as a way
 	 * to look up our filesystem block.
@@ -1360,21 +1370,11 @@ static int dax_iomap_pmd_fault(struct vm
 	pos = (loff_t)pgoff << PAGE_SHIFT;
 	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
 	if (error)
-		goto fallback;
+		goto unlock_entry;
 
 	if (iomap.offset + iomap.length < pos + PMD_SIZE)
 		goto finish_iomap;
 
-	/*
-	 * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
-	 * PMD or a HZP entry.  If it can't (because a 4k page is already in
-	 * the tree, for instance), it will return -EEXIST and we just fall
-	 * back to 4k entries.
-	 */
-	entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
-	if (IS_ERR(entry))
-		goto finish_iomap;
-
 	switch (iomap.type) {
 	case IOMAP_MAPPED:
 		result = dax_pmd_insert_mapping(vmf, &iomap, pos, &entry);
@@ -1382,7 +1382,7 @@ static int dax_iomap_pmd_fault(struct vm
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
 		if (WARN_ON_ONCE(write))
-			goto unlock_entry;
+			break;
 		result = dax_pmd_load_hole(vmf, &iomap, &entry);
 		break;
 	default:
@@ -1390,8 +1390,6 @@ static int dax_iomap_pmd_fault(struct vm
 		break;
 	}
 
- unlock_entry:
-	put_locked_mapping_entry(mapping, pgoff, entry);
  finish_iomap:
 	if (ops->iomap_end) {
 		int copied = PMD_SIZE;
@@ -1407,6 +1405,8 @@ static int dax_iomap_pmd_fault(struct vm
 		ops->iomap_end(inode, pos, PMD_SIZE, copied, iomap_flags,
 				&iomap);
 	}
+ unlock_entry:
+	put_locked_mapping_entry(mapping, pgoff, entry);
  fallback:
 	if (result == VM_FAULT_FALLBACK) {
 		split_huge_pmd(vma, vmf->pmd, vmf->address);


Patches currently in stable-queue which might be from ross.zwisler@linux.intel.com are

queue-4.11/fs-block_dev-always-invalidate-cleancache-in-invalidate_bdev.patch
queue-4.11/ext4-return-to-starting-transaction-in-ext4_dax_huge_fault.patch
queue-4.11/dax-prevent-invalidation-of-mapped-dax-entries.patch
queue-4.11/dax-fix-pmd-data-corruption-when-fault-races-with-write.patch
queue-4.11/fs-fix-data-invalidation-in-the-cleancache-during-direct-io.patch
queue-4.11/mm-fix-data-corruption-due-to-stale-mmap-reads.patch

                 reply	other threads:[~2017-05-18  7:45 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=149509350714465@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).