All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	'Andrea Arcangeli' <aarcange@redhat.com>,
	'Andrew Morton' <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] thp: fix MADV_DONTNEED vs. MADV_FREE race
Date: Wed, 8 Mar 2017 15:17:26 +0900	[thread overview]
Message-ID: <20170308061726.GD11206@bbox> (raw)
In-Reply-To: <20170307140453.GB2412@node>

On Tue, Mar 07, 2017 at 05:04:53PM +0300, Kirill A. Shutemov wrote:
> On Mon, Mar 06, 2017 at 10:44:46AM +0900, Minchan Kim wrote:
> > Hello, Kirill,
> > 
> > On Fri, Mar 03, 2017 at 01:26:36PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Mar 03, 2017 at 01:35:11PM +0800, Hillf Danton wrote:
> > > > 
> > > > On March 02, 2017 11:11 PM Kirill A. Shutemov wrote: 
> > > > > 
> > > > > Basically the same race as with numa balancing in change_huge_pmd(), but
> > > > > a bit simpler to mitigate: we don't need to preserve dirty/young flags
> > > > > here due to MADV_FREE functionality.
> > > > > 
> > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > Cc: Minchan Kim <minchan@kernel.org>
> > > > > ---
> > > > >  mm/huge_memory.c | 2 --
> > > > >  1 file changed, 2 deletions(-)
> > > > > 
> > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > index bb2b3646bd78..324217c31ec9 100644
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -1566,8 +1566,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > > > >  		deactivate_page(page);
> > > > > 
> > > > >  	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
> > > > > -		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
> > > > > -			tlb->fullmm);
> > > > >  		orig_pmd = pmd_mkold(orig_pmd);
> > > > >  		orig_pmd = pmd_mkclean(orig_pmd);
> > > > > 
> > > > $ grep -n set_pmd_at  linux-4.10/arch/powerpc/mm/pgtable-book3s64.c
> > > > 
> > > > /*
> > > >  * set a new huge pmd. We should not be called for updating
> > > >  * an existing pmd entry. That should go via pmd_hugepage_update.
> > > >  */
> > > > void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> > > 
> > > +Aneesh.
> > > 
> > > Urgh... Power is special again.
> > > 
> > > I think this should work fine.
> > > 
> > > From 056914fa025992c0a2212aee057c26307ce60238 Mon Sep 17 00:00:00 2001
> > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > > Date: Thu, 2 Mar 2017 16:47:45 +0300
> > > Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race
> > > 
> > > Basically the same race as with numa balancing in change_huge_pmd(), but
> > > a bit simpler to mitigate: we don't need to preserve dirty/young flags
> > > here due to MADV_FREE functionality.
> > 
> > Could you elaborate a bit more here rather than relying on other
> > patch's description?
> 
> Okay, updated patch is below.

Thanks. It looks much better.

> 
> > And could you say what happens to the userspace if that race
> > happens? When I guess from title "MADV_DONTNEED vs MADV_FREE",
> > a page cannot be zapped but marked lazyfree or vise versa? Right?
> 
> "Vise versa" part should be fine. The case I'm worry about is that
> MADV_DONTNEED would skip the pmd and it will not be cleared.
> Userspace expects the area of memory to be clean after MADV_DONTNEED, but
> it's not. It can lead to userspace misbehaviour.

Yeb.

> 
> From a0967b0293a6f8053d85785c4d6340e550e849ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Thu, 2 Mar 2017 16:47:45 +0300
> Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race
> 
> Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).
> It's critical to not clear pmd intermittently while handling MADV_FREE to
> avoid race with MADV_DONTNEED:
> 
> 	CPU0:				CPU1:
> 				madvise_free_huge_pmd()
> 				 pmdp_huge_get_and_clear_full()
> madvise_dontneed()
>  zap_pmd_range()
>   pmd_trans_huge(*pmd) == 0 (without ptl)
>   // skip the pmd
> 				 set_pmd_at();
> 				 // pmd is re-established
> 
> It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It
> violates MADV_DONTNEED interface and can result is userspace misbehaviour.
> 
> Basically it's the same race as with numa balancing in change_huge_pmd(),
> but a bit simpler to mitigate: we don't need to preserve dirty/young flags
> here due to MADV_FREE functionality.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-08  6:17 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-02 15:10 [PATCH 0/4] thp: fix few MADV_DONTNEED races Kirill A. Shutemov
2017-03-02 15:10 ` Kirill A. Shutemov
2017-03-02 15:10 ` [PATCH 1/4] thp: reduce indentation level in change_huge_pmd() Kirill A. Shutemov
2017-03-02 15:10   ` Kirill A. Shutemov
2017-04-12 11:37   ` Vlastimil Babka
2017-04-12 11:37     ` Vlastimil Babka
2017-03-02 15:10 ` [PATCH 2/4] thp: fix MADV_DONTNEED vs. numa balancing race Kirill A. Shutemov
2017-03-02 15:10   ` Kirill A. Shutemov
2017-03-03 17:17   ` Dave Hansen
2017-03-03 17:17     ` Dave Hansen
2017-04-12 13:33   ` Vlastimil Babka
2017-04-12 13:33     ` Vlastimil Babka
2017-05-16 14:54     ` Vlastimil Babka
2017-05-16 14:54       ` Vlastimil Babka
2017-05-16 20:29     ` Andrea Arcangeli
2017-05-16 20:29       ` Andrea Arcangeli
2017-05-23 12:42       ` Vlastimil Babka
2017-05-23 12:42         ` Vlastimil Babka
2017-06-09  8:21         ` Vlastimil Babka
2017-06-09  8:21           ` Vlastimil Babka
2017-03-02 15:10 ` [PATCH 3/4] thp: fix MADV_DONTNEED vs. MADV_FREE race Kirill A. Shutemov
2017-03-02 15:10   ` Kirill A. Shutemov
2017-03-03  5:35   ` Hillf Danton
2017-03-03  5:35     ` Hillf Danton
2017-03-03 10:26     ` Kirill A. Shutemov
2017-03-03 10:26       ` Kirill A. Shutemov
2017-03-06  1:44       ` Minchan Kim
2017-03-06  1:44         ` Minchan Kim
2017-03-07 14:04         ` Kirill A. Shutemov
2017-03-07 14:04           ` Kirill A. Shutemov
2017-03-08  6:17           ` Minchan Kim [this message]
2017-03-06  2:49   ` Aneesh Kumar K.V
2017-03-06  2:49     ` Aneesh Kumar K.V
2017-03-07 13:52     ` Kirill A. Shutemov
2017-03-07 13:52       ` Kirill A. Shutemov
2017-03-02 15:10 ` [PATCH 4/4] thp: fix MADV_DONTNEED vs clear soft dirty race Kirill A. Shutemov
2017-03-02 15:10   ` Kirill A. Shutemov
2017-03-03 22:29   ` Andrew Morton
2017-03-03 22:29     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170308061726.GD11206@bbox \
    --to=minchan@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.