All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Henry Willard <henry.willard@oracle.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: mprotect: check page dirty when change ptes
Date: Wed, 12 Sep 2018 09:24:39 -0400	[thread overview]
Message-ID: <20180912132438.GB4009@redhat.com> (raw)
In-Reply-To: <20180912130355.GA4009@redhat.com>

On Wed, Sep 12, 2018 at 09:03:55AM -0400, Jerome Glisse wrote:
> On Wed, Sep 12, 2018 at 02:49:21PM +0800, Peter Xu wrote:
> > Add an extra check on page dirty bit in change_pte_range() since there
> > might be case where PTE dirty bit is unset but it's actually dirtied.
> > One example is when a huge PMD is splitted after written: the dirty bit
> > will be set on the compound page however we won't have the dirty bit set
> > on each of the small page PTEs.
> > 
> > I noticed this when debugging with a customized kernel that implemented
> > userfaultfd write-protect.  In that case, the dirty bit will be critical
> > since that's required for userspace to handle the write protect page
> > fault (otherwise it'll get a SIGBUS with a loop of page faults).
> > However it should still be good even for upstream Linux to cover more
> > scenarios where we shouldn't need to do extra page faults on the small
> > pages if the previous huge page is already written, so the dirty bit
> > optimization path underneath can cover more.
> > 
> 
> So as said by Kirill NAK you are not looking at the right place for
> your bug please first apply the below patch and read my analysis in
> my last reply.

Just to be clear you are trying to fix a userspace bug that is hidden
for non THP pages by a kernel space bug inside userfaultfd by making
the kernel space bug of userfaultfd buggy for THP too.


> 
> Below patch fix userfaultfd bug. I am not posting it as it is on a
> branch and i am not sure when Andrea plan to post. Andrea feel free
> to squash that fix.
> 
> 
> From 35cdb30afa86424c2b9f23c0982afa6731be961c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 12 Sep 2018 08:58:33 -0400
> Subject: [PATCH] userfaultfd: do not set dirty accountable when changing
>  protection
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> mwriteprotect_range() has nothing to do with the dirty accountable
> optimization so do not set it as it opens a door for userspace to
> unwrite protect pages in a range that is write protected ie the vma
> !(vm_flags & VM_WRITE).
> 
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> ---
>  mm/userfaultfd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index a0379c5ffa7c..59db1ce48fa0 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -632,7 +632,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
>  		newprot = vm_get_page_prot(dst_vma->vm_flags);
>  
>  	change_protection(dst_vma, start, start + len, newprot,
> -				!enable_wp, 0);
> +				false, 0);
>  
>  	err = 0;
>  out_unlock:
> -- 
> 2.17.1
> 

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Henry Willard <henry.willard@oracle.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: mprotect: check page dirty when change ptes
Date: Wed, 12 Sep 2018 09:24:39 -0400	[thread overview]
Message-ID: <20180912132438.GB4009@redhat.com> (raw)
In-Reply-To: <20180912130355.GA4009@redhat.com>

On Wed, Sep 12, 2018 at 09:03:55AM -0400, Jerome Glisse wrote:
> On Wed, Sep 12, 2018 at 02:49:21PM +0800, Peter Xu wrote:
> > Add an extra check on page dirty bit in change_pte_range() since there
> > might be case where PTE dirty bit is unset but it's actually dirtied.
> > One example is when a huge PMD is splitted after written: the dirty bit
> > will be set on the compound page however we won't have the dirty bit set
> > on each of the small page PTEs.
> > 
> > I noticed this when debugging with a customized kernel that implemented
> > userfaultfd write-protect.  In that case, the dirty bit will be critical
> > since that's required for userspace to handle the write protect page
> > fault (otherwise it'll get a SIGBUS with a loop of page faults).
> > However it should still be good even for upstream Linux to cover more
> > scenarios where we shouldn't need to do extra page faults on the small
> > pages if the previous huge page is already written, so the dirty bit
> > optimization path underneath can cover more.
> > 
> 
> So as said by Kirill NAK you are not looking at the right place for
> your bug please first apply the below patch and read my analysis in
> my last reply.

Just to be clear you are trying to fix a userspace bug that is hidden
for non THP pages by a kernel space bug inside userfaultfd by making
the kernel space bug of userfaultfd buggy for THP too.


> 
> Below patch fix userfaultfd bug. I am not posting it as it is on a
> branch and i am not sure when Andrea plan to post. Andrea feel free
> to squash that fix.
> 
> 
> From 35cdb30afa86424c2b9f23c0982afa6731be961c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 12 Sep 2018 08:58:33 -0400
> Subject: [PATCH] userfaultfd: do not set dirty accountable when changing
>  protection
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> mwriteprotect_range() has nothing to do with the dirty accountable
> optimization so do not set it as it opens a door for userspace to
> unwrite protect pages in a range that is write protected ie the vma
> !(vm_flags & VM_WRITE).
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  mm/userfaultfd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index a0379c5ffa7c..59db1ce48fa0 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -632,7 +632,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
>  		newprot = vm_get_page_prot(dst_vma->vm_flags);
>  
>  	change_protection(dst_vma, start, start + len, newprot,
> -				!enable_wp, 0);
> +				false, 0);
>  
>  	err = 0;
>  out_unlock:
> -- 
> 2.17.1
> 

  reply	other threads:[~2018-09-12 13:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12  6:49 [PATCH v2] mm: mprotect: check page dirty when change ptes Peter Xu
2018-09-12 10:33 ` Kirill A. Shutemov
2018-09-12 13:03 ` Jerome Glisse
2018-09-12 13:03   ` Jerome Glisse
2018-09-12 13:24   ` Jerome Glisse [this message]
2018-09-12 13:24     ` Jerome Glisse
2018-09-13  7:37     ` Peter Xu
2018-09-13  7:37       ` Peter Xu
2018-09-13 14:23       ` Jerome Glisse
2018-09-13 14:23         ` Jerome Glisse
2018-09-14  0:42         ` Jerome Glisse
2018-09-14  0:42           ` Jerome Glisse
2018-09-14  7:16           ` Peter Xu
2018-09-14  7:16             ` Peter Xu
2018-09-15  0:41             ` Jerome Glisse
2018-09-15  0:41               ` Jerome Glisse
2018-09-27  7:43               ` Peter Xu
2018-09-27  7:43                 ` Peter Xu
2018-09-27  7:56                 ` Jerome Glisse
2018-09-27  7:56                   ` Jerome Glisse
2018-09-27  8:21                   ` Peter Xu
2018-09-27  8:21                     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912132438.GB4009@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=henry.willard@oracle.com \
    --cc=khalid.aziz@oracle.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=peterx@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.