From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Peter Feiner <pfeiner@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Cyrill Gorcunov <gorcunov@openvz.org>,
Pavel Emelyanov <xemul@parallels.com>,
Jamie Liu <jamieliu@google.com>, Hugh Dickins <hughd@google.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
Date: Sun, 24 Aug 2014 02:00:11 +0300 [thread overview]
Message-ID: <20140823230011.GA26483@node.dhcp.inet.fi> (raw)
In-Reply-To: <1408831921-10168-2-git-send-email-pfeiner@google.com>
On Sat, Aug 23, 2014 at 06:11:59PM -0400, Peter Feiner wrote:
> For VMAs that don't want write notifications, PTEs created for read
> faults have their write bit set. If the read fault happens after
> VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
> clear after subsequent writes.
>
> Here's a simple code snippet to demonstrate the bug:
>
> char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
> MAP_ANONYMOUS | MAP_SHARED, -1, 0);
> system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
> assert(*m == '\0'); /* new PTE allows write access */
> assert(!soft_dirty(x));
> *m = 'x'; /* should dirty the page */
> assert(soft_dirty(x)); /* fails */
>
> With this patch, write notifications are enabled when VM_SOFTDIRTY is
> cleared. Furthermore, to avoid faults, write notifications are
> disabled when VM_SOFTDIRTY is reset.
>
> Signed-off-by: Peter Feiner <pfeiner@google.com>
> ---
> v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write
> notifications on vm_page_prot when we clear VM_SOFTDIRTY.
>
> fs/proc/task_mmu.c | 17 ++++++++++++++++-
> include/linux/mm.h | 15 +++++++++++++++
> mm/mmap.c | 10 +++++++++-
> 3 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index dfc791c..f1a5382 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -851,8 +851,23 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
> continue;
> if (type == CLEAR_REFS_SOFT_DIRTY) {
> - if (vma->vm_flags & VM_SOFTDIRTY)
> + if (vma->vm_flags & VM_SOFTDIRTY) {
Why do we need the branch here. Does it save us anything?
Looks like we can update vm_flags and enable writenotify unconditionally.
Indentation level is high enough already.
> vma->vm_flags &= ~VM_SOFTDIRTY;
> + /*
> + * We don't have a write lock on
> + * mm->mmap_sem, so we race with the
> + * fault handler reading vm_page_prot.
> + * Therefore writable PTEs (that won't
> + * have soft-dirty set) can be created
> + * for read faults. However, since the
> + * PTE lock is held while vm_page_prot
> + * is read and while we write protect
> + * PTEs during our walk, any writable
> + * PTEs that slipped through will be
> + * write protected.
> + */
Hm.. Isn't this yet another bug?
Updating vma->vm_flags without down_write(&mm->mmap_sem) looks troublesome
to me. Am I wrong?
> + vma_enable_writenotify(vma);
> + }
> }
> walk_page_range(vma->vm_start, vma->vm_end,
> &clear_refs_walk);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8981cc8..5f26634 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1946,6 +1946,21 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
> }
> #endif
>
> +/* Enable write notifications without blowing away special flags. */
> +static inline void vma_enable_writenotify(struct vm_area_struct *vma)
> +{
> + vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> + vm_get_page_prot(vma->vm_flags &
> + ~VM_SHARED));
I think this way is more readable:
pgprot_t newprot;
newprot = vm_get_page_prot(vma->vm_flags & ~VM_SHARED);
vma->vm_page_prot = pgprot_modify(vma->vm_page_prot, newprot);
> +}
> +
> +/* Disable write notifications without blowing away special flags. */
> +static inline void vma_disable_writenotify(struct vm_area_struct *vma)
> +{
> + vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> + vm_get_page_prot(vma->vm_flags));
ditto.
> +}
> +
> #ifdef CONFIG_NUMA_BALANCING
> unsigned long change_prot_numa(struct vm_area_struct *vma,
> unsigned long start, unsigned long end);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c1f2ea4..abcac32 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1549,8 +1549,16 @@ munmap_back:
> * Can we just expand an old mapping?
> */
> vma = vma_merge(mm, prev, addr, addr + len, vm_flags, NULL, file, pgoff, NULL);
> - if (vma)
> + if (vma) {
> + if (!vma_wants_writenotify(vma)) {
> + /*
> + * We're going to reset VM_SOFTDIRTY, so we can disable
> + * write notifications.
> + */
> + vma_disable_writenotify(vma);
> + }
> goto out;
> + }
>
> /*
> * Determine the object being mapped and call the appropriate
> --
> 2.1.0.rc2.206.gedb03e5
>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-08-23 23:12 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-20 21:46 [PATCH] mm: softdirty: write protect PTEs created for read faults after VM_SOFTDIRTY cleared Peter Feiner
2014-08-20 23:45 ` Kirill A. Shutemov
2014-08-21 19:37 ` Peter Feiner
2014-08-21 20:51 ` Cyrill Gorcunov
2014-08-21 21:39 ` Kirill A. Shutemov
2014-08-21 21:46 ` Peter Feiner
2014-08-21 21:51 ` Kirill A. Shutemov
2014-08-21 22:50 ` Peter Feiner
2014-08-22 6:33 ` Cyrill Gorcunov
2014-08-23 22:11 ` [PATCH v2 0/3] softdirty fix and write notification cleanup Peter Feiner
2014-08-23 22:11 ` [PATCH v2 1/3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared Peter Feiner
2014-08-23 23:00 ` Kirill A. Shutemov [this message]
2014-08-23 23:15 ` Peter Feiner
2014-08-23 23:50 ` Kirill A. Shutemov
2014-08-24 0:55 ` Peter Feiner
2014-08-23 22:12 ` [PATCH v2 2/3] mm: mprotect: preserve special page protection bits Peter Feiner
2014-08-23 22:12 ` [PATCH v2 3/3] mm: mmap: cleanup code that preserves special vm_page_prot bits Peter Feiner
2014-08-24 1:43 ` [PATCH v3] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared Peter Feiner
2014-08-24 7:59 ` Kirill A. Shutemov
2014-08-24 19:22 ` Cyrill Gorcunov
2014-08-24 14:41 ` [PATCH v4] " Peter Feiner
2014-08-25 3:34 ` [PATCH v5] " Peter Feiner
2014-08-26 4:45 ` Hugh Dickins
2014-08-26 6:49 ` Cyrill Gorcunov
2014-08-26 14:04 ` Kirill A. Shutemov
2014-08-26 14:19 ` Cyrill Gorcunov
2014-08-26 14:56 ` Kirill A. Shutemov
2014-08-26 15:18 ` Cyrill Gorcunov
2014-08-26 15:43 ` Kirill A. Shutemov
2014-08-26 15:53 ` Cyrill Gorcunov
2014-08-27 23:12 ` Hugh Dickins
2014-08-28 6:31 ` Cyrill Gorcunov
2014-08-27 21:55 ` Hugh Dickins
2014-09-04 16:43 ` Peter Feiner
2014-09-07 21:31 ` Peter Feiner
2014-09-07 23:01 ` [PATCH v6] " Peter Feiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140823230011.GA26483@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=gorcunov@openvz.org \
--cc=hughd@google.com \
--cc=jamieliu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=pfeiner@google.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).