All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Feiner <pfeiner@google.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Peter Feiner <pfeiner@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Jamie Liu <jamieliu@google.com>, Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2 1/3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
Date: Sat, 23 Aug 2014 18:11:59 -0400	[thread overview]
Message-ID: <1408831921-10168-2-git-send-email-pfeiner@google.com> (raw)
In-Reply-To: <1408831921-10168-1-git-send-email-pfeiner@google.com>

For VMAs that don't want write notifications, PTEs created for read
faults have their write bit set. If the read fault happens after
VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
clear after subsequent writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
                 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
  assert(*m == '\0');     /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';               /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid faults, write notifications are
disabled when VM_SOFTDIRTY is reset.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write
           notifications on vm_page_prot when we clear VM_SOFTDIRTY.

 fs/proc/task_mmu.c | 17 ++++++++++++++++-
 include/linux/mm.h | 15 +++++++++++++++
 mm/mmap.c          | 10 +++++++++-
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..f1a5382 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -851,8 +851,23 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
 				continue;
 			if (type == CLEAR_REFS_SOFT_DIRTY) {
-				if (vma->vm_flags & VM_SOFTDIRTY)
+				if (vma->vm_flags & VM_SOFTDIRTY) {
 					vma->vm_flags &= ~VM_SOFTDIRTY;
+					/*
+					 * We don't have a write lock on
+					 * mm->mmap_sem, so we race with the
+					 * fault handler reading vm_page_prot.
+					 * Therefore writable PTEs (that won't
+					 * have soft-dirty set) can be created
+					 * for read faults. However, since the
+					 * PTE lock is held while vm_page_prot
+					 * is read and while we write protect
+					 * PTEs during our walk, any writable
+					 * PTEs that slipped through will be
+					 * write protected.
+					 */
+					vma_enable_writenotify(vma);
+				}
 			}
 			walk_page_range(vma->vm_start, vma->vm_end,
 					&clear_refs_walk);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc8..5f26634 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1946,6 +1946,21 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
 }
 #endif
 
+/* Enable write notifications without blowing away special flags. */
+static inline void vma_enable_writenotify(struct vm_area_struct *vma)
+{
+	vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
+	                                  vm_get_page_prot(vma->vm_flags &
+					                   ~VM_SHARED));
+}
+
+/* Disable write notifications without blowing away special flags. */
+static inline void vma_disable_writenotify(struct vm_area_struct *vma)
+{
+	vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
+	                                  vm_get_page_prot(vma->vm_flags));
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 unsigned long change_prot_numa(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
diff --git a/mm/mmap.c b/mm/mmap.c
index c1f2ea4..abcac32 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1549,8 +1549,16 @@ munmap_back:
 	 * Can we just expand an old mapping?
 	 */
 	vma = vma_merge(mm, prev, addr, addr + len, vm_flags, NULL, file, pgoff, NULL);
-	if (vma)
+	if (vma) {
+		if (!vma_wants_writenotify(vma)) {
+			/*
+			 * We're going to reset VM_SOFTDIRTY, so we can disable
+			 * write notifications.
+			 */
+			vma_disable_writenotify(vma);
+		}
 		goto out;
+	}
 
 	/*
 	 * Determine the object being mapped and call the appropriate
-- 
2.1.0.rc2.206.gedb03e5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Peter Feiner <pfeiner@google.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Peter Feiner <pfeiner@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Jamie Liu <jamieliu@google.com>, Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2 1/3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
Date: Sat, 23 Aug 2014 18:11:59 -0400	[thread overview]
Message-ID: <1408831921-10168-2-git-send-email-pfeiner@google.com> (raw)
In-Reply-To: <1408831921-10168-1-git-send-email-pfeiner@google.com>

For VMAs that don't want write notifications, PTEs created for read
faults have their write bit set. If the read fault happens after
VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
clear after subsequent writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
                 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
  assert(*m == '\0');     /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';               /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid faults, write notifications are
disabled when VM_SOFTDIRTY is reset.

Signed-off-by: Peter Feiner <pfeiner@google.com>
---
 v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write
           notifications on vm_page_prot when we clear VM_SOFTDIRTY.

 fs/proc/task_mmu.c | 17 ++++++++++++++++-
 include/linux/mm.h | 15 +++++++++++++++
 mm/mmap.c          | 10 +++++++++-
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..f1a5382 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -851,8 +851,23 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
 				continue;
 			if (type == CLEAR_REFS_SOFT_DIRTY) {
-				if (vma->vm_flags & VM_SOFTDIRTY)
+				if (vma->vm_flags & VM_SOFTDIRTY) {
 					vma->vm_flags &= ~VM_SOFTDIRTY;
+					/*
+					 * We don't have a write lock on
+					 * mm->mmap_sem, so we race with the
+					 * fault handler reading vm_page_prot.
+					 * Therefore writable PTEs (that won't
+					 * have soft-dirty set) can be created
+					 * for read faults. However, since the
+					 * PTE lock is held while vm_page_prot
+					 * is read and while we write protect
+					 * PTEs during our walk, any writable
+					 * PTEs that slipped through will be
+					 * write protected.
+					 */
+					vma_enable_writenotify(vma);
+				}
 			}
 			walk_page_range(vma->vm_start, vma->vm_end,
 					&clear_refs_walk);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc8..5f26634 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1946,6 +1946,21 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
 }
 #endif
 
+/* Enable write notifications without blowing away special flags. */
+static inline void vma_enable_writenotify(struct vm_area_struct *vma)
+{
+	vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
+	                                  vm_get_page_prot(vma->vm_flags &
+					                   ~VM_SHARED));
+}
+
+/* Disable write notifications without blowing away special flags. */
+static inline void vma_disable_writenotify(struct vm_area_struct *vma)
+{
+	vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
+	                                  vm_get_page_prot(vma->vm_flags));
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 unsigned long change_prot_numa(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
diff --git a/mm/mmap.c b/mm/mmap.c
index c1f2ea4..abcac32 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1549,8 +1549,16 @@ munmap_back:
 	 * Can we just expand an old mapping?
 	 */
 	vma = vma_merge(mm, prev, addr, addr + len, vm_flags, NULL, file, pgoff, NULL);
-	if (vma)
+	if (vma) {
+		if (!vma_wants_writenotify(vma)) {
+			/*
+			 * We're going to reset VM_SOFTDIRTY, so we can disable
+			 * write notifications.
+			 */
+			vma_disable_writenotify(vma);
+		}
 		goto out;
+	}
 
 	/*
 	 * Determine the object being mapped and call the appropriate
-- 
2.1.0.rc2.206.gedb03e5


  reply	other threads:[~2014-08-23 22:12 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-20 21:46 [PATCH] mm: softdirty: write protect PTEs created for read faults after VM_SOFTDIRTY cleared Peter Feiner
2014-08-20 21:46 ` Peter Feiner
2014-08-20 23:45 ` Kirill A. Shutemov
2014-08-20 23:45   ` Kirill A. Shutemov
2014-08-21 19:37   ` Peter Feiner
2014-08-21 19:37     ` Peter Feiner
2014-08-21 20:51     ` Cyrill Gorcunov
2014-08-21 20:51       ` Cyrill Gorcunov
2014-08-21 21:39       ` Kirill A. Shutemov
2014-08-21 21:39         ` Kirill A. Shutemov
2014-08-21 21:46         ` Peter Feiner
2014-08-21 21:46           ` Peter Feiner
2014-08-21 21:51           ` Kirill A. Shutemov
2014-08-21 21:51             ` Kirill A. Shutemov
2014-08-21 22:50             ` Peter Feiner
2014-08-21 22:50               ` Peter Feiner
2014-08-22  6:33               ` Cyrill Gorcunov
2014-08-22  6:33                 ` Cyrill Gorcunov
2014-08-23 22:11 ` [PATCH v2 0/3] softdirty fix and write notification cleanup Peter Feiner
2014-08-23 22:11   ` Peter Feiner
2014-08-23 22:11   ` Peter Feiner [this message]
2014-08-23 22:11     ` [PATCH v2 1/3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared Peter Feiner
2014-08-23 23:00     ` Kirill A. Shutemov
2014-08-23 23:00       ` Kirill A. Shutemov
2014-08-23 23:15       ` Peter Feiner
2014-08-23 23:15         ` Peter Feiner
2014-08-23 23:50       ` Kirill A. Shutemov
2014-08-23 23:50         ` Kirill A. Shutemov
2014-08-24  0:55         ` Peter Feiner
2014-08-24  0:55           ` Peter Feiner
2014-08-23 22:12   ` [PATCH v2 2/3] mm: mprotect: preserve special page protection bits Peter Feiner
2014-08-23 22:12     ` Peter Feiner
2014-08-23 22:12   ` [PATCH v2 3/3] mm: mmap: cleanup code that preserves special vm_page_prot bits Peter Feiner
2014-08-23 22:12     ` Peter Feiner
2014-08-24  1:43 ` [PATCH v3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared Peter Feiner
2014-08-24  1:43   ` Peter Feiner
2014-08-24  7:59   ` Kirill A. Shutemov
2014-08-24  7:59     ` Kirill A. Shutemov
2014-08-24 19:22     ` Cyrill Gorcunov
2014-08-24 19:22       ` Cyrill Gorcunov
2014-08-24 14:41 ` [PATCH v4] " Peter Feiner
2014-08-24 14:41   ` Peter Feiner
2014-08-25  3:34 ` [PATCH v5] " Peter Feiner
2014-08-25  3:34   ` Peter Feiner
2014-08-26  4:45   ` Hugh Dickins
2014-08-26  4:45     ` Hugh Dickins
2014-08-26  6:49     ` Cyrill Gorcunov
2014-08-26  6:49       ` Cyrill Gorcunov
2014-08-26 14:04       ` Kirill A. Shutemov
2014-08-26 14:04         ` Kirill A. Shutemov
2014-08-26 14:19         ` Cyrill Gorcunov
2014-08-26 14:19           ` Cyrill Gorcunov
2014-08-26 14:56           ` Kirill A. Shutemov
2014-08-26 14:56             ` Kirill A. Shutemov
2014-08-26 15:18             ` Cyrill Gorcunov
2014-08-26 15:18               ` Cyrill Gorcunov
2014-08-26 15:43               ` Kirill A. Shutemov
2014-08-26 15:43                 ` Kirill A. Shutemov
2014-08-26 15:53                 ` Cyrill Gorcunov
2014-08-26 15:53                   ` Cyrill Gorcunov
2014-08-27 23:12                   ` Hugh Dickins
2014-08-27 23:12                     ` Hugh Dickins
2014-08-28  6:31                     ` Cyrill Gorcunov
2014-08-28  6:31                       ` Cyrill Gorcunov
2014-08-27 21:55       ` Hugh Dickins
2014-08-27 21:55         ` Hugh Dickins
2014-09-04 16:43     ` Peter Feiner
2014-09-04 16:43       ` Peter Feiner
2014-09-07 21:31       ` Peter Feiner
2014-09-07 21:31         ` Peter Feiner
2014-09-07 23:01 ` [PATCH v6] " Peter Feiner
2014-09-07 23:01   ` Peter Feiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1408831921-10168-2-git-send-email-pfeiner@google.com \
    --to=pfeiner@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=gorcunov@openvz.org \
    --cc=hughd@google.com \
    --cc=jamieliu@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.