Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yu Zhao <yuzhao@google.com>
To: John Hubbard <jhubbard@nvidia.com>, Mark Rutland <mark.rutland@arm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Lance Roy" <ldr709@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Dave Airlie" <airlied@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Mel Gorman" <mgorman@suse.de>, "Jan Kara" <jack@suse.cz>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Aaron Lu" <ziqian.lzq@antfin.com>,
	"Omar Sandoval" <osandov@fb.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely
Date: Tue, 1 Oct 2019 18:00:46 -0600	[thread overview]
Message-ID: <20191002000046.GA60764@google.com> (raw)
In-Reply-To: <712513fe-f064-c965-d165-80d43cfc606f@nvidia.com>

On Tue, Oct 01, 2019 at 03:31:51PM -0700, John Hubbard wrote:
> On 9/26/19 10:06 PM, Yu Zhao wrote:
> > On Thu, Sep 26, 2019 at 08:26:46PM -0700, John Hubbard wrote:
> >> On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
> >>> On Wed, Sep 25, 2019 at 04:26:54PM -0600, Yu Zhao wrote:
> >>>> On Wed, Sep 25, 2019 at 10:25:30AM +0200, Peter Zijlstra wrote:
> >>>>> On Tue, Sep 24, 2019 at 05:24:58PM -0600, Yu Zhao wrote:
> >> ...
> >>>>> I'm thinking this patch make stuff rather fragile.. Should we instead
> >>>>> stick the barrier in set_p*d_at() instead? Or rather, make that store a
> >>>>> store-release?
> >>>>
> >>>> I prefer it this way too, but I suspected the majority would be
> >>>> concerned with the performance implications, especially those
> >>>> looping set_pte_at()s in mm/huge_memory.c.
> >>>
> >>> We can rename current set_pte_at() to __set_pte_at() or something and
> >>> leave it in places where barrier is not needed. The new set_pte_at()( will
> >>> be used in the rest of the places with the barrier inside.
> >>
> >> +1, sounds nice. I was unhappy about the wide-ranging changes that would have
> >> to be maintained. So this seems much better.
> > 
> > Just to be clear that doing so will add unnecessary barriers to one
> > of the two paths that share set_pte_at().
> 
> Good point, maybe there's a better place to do it...
> 
> 
> > 
> >>> BTW, have you looked at other levels of page table hierarchy. Do we have
> >>> the same issue for PMD/PUD/... pages?
> >>>
> >>
> >> Along the lines of "what other memory barriers might be missing for
> >> get_user_pages_fast(), I'm also concerned that the synchronization between
> >> get_user_pages_fast() and freeing the page tables might be technically broken,
> >> due to missing memory barriers on the get_user_pages_fast() side. Details:
> >>
> >> gup_fast() disables interrupts, but I think it also needs some sort of
> >> memory barrier(s), in order to prevent reads of the page table (gup_pgd_range,
> >> etc) from speculatively happening before the interrupts are disabled. 
> > 
> > I was under impression switching back from interrupt context is a
> > full barrier (otherwise wouldn't we be vulnerable to some side
> > channel attacks?), so the reader side wouldn't need explicit rmb.
> > 
> 
> Documentation/memory-barriers.txt points out:
> 
> INTERRUPT DISABLING FUNCTIONS
> -----------------------------
> 
> Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
> (RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
> barriers are required in such a situation, they must be provided from some
> other means.
> 
> btw, I'm really sorry I missed your responses over the last 3 or 4 days.
> I just tracked down something in our email system that was sometimes
> moving some emails to spam (just few enough to escape immediate attention, argghh!).
> I think I killed it off for good now. I wasn't ignoring you. :)

Thanks, John. I agree with all you said, including the irq disabling
function not being a sufficient smp_rmb().

I was hoping somebody could clarify whether ipi handlers used by tlb
flush are sufficient to prevent CPU 1 from seeing any stale data from
freed page tables on all supported archs.

	CPU 1			CPU 2

				flush remote tlb by ipi
				wait for the ipi hanlder
	<ipi handler>
				free page table
	disable irq
	walk page table
	enable irq

I think they should because otherwise tlb flush wouldn't work if CPU 1
still sees stale data from the freed page table, unless there is a
really strange CPU cache design I'm not aware of.

Quoting comments from x86 ipi handler flush_tlb_func_common():
 * read active_mm's tlb_gen.  We don't need any explicit barriers
 * because all x86 flush operations are serializing and the
 * atomic64_read operation won't be reordered by the compiler.

For ppc64 ipi hander radix__flush_tlb_range(), there is an "eieio"
instruction:
  https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/assembler/idalangref_eieio_instrs.html
I'm not sure why it's not "sync" -- I'd guess something implicitly
works as "sync" already (or it's a bug).

I didn't find an ipi handler for tlb flush on arm64. There should be
one, otherwise fast gup on arm64 would be broken. Mark?

next prev parent reply	other threads:[~2019-10-02  0:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 22:56 [PATCH] mm: don't expose page to fast gup before it's ready Yu Zhao
2018-01-08 22:56 ` Yu Zhao
2018-01-09  8:46 ` Michal Hocko
2018-01-09  8:46   ` Michal Hocko
2018-01-09 10:10   ` Yu Zhao
2018-01-09 10:10     ` Yu Zhao
2018-01-31 23:07     ` Andrew Morton
2018-01-31 23:07       ` Andrew Morton
2019-05-14 21:25     ` Andrew Morton
2019-05-14 23:07       ` Yu Zhao
2019-09-14  7:05         ` [PATCH v2] mm: don't expose page to fast gup prematurely Yu Zhao
2019-09-24 11:23           ` Kirill A. Shutemov
2019-09-24 22:05             ` Yu Zhao
2019-09-25 12:17               ` Kirill A. Shutemov
2019-09-26  3:58                 ` Yu Zhao
2019-09-24 23:24           ` [PATCH v3 1/4] mm: remove unnecessary smp_wmb() in collapse_huge_page() Yu Zhao
2019-09-24 23:24             ` [PATCH v3 2/4] mm: don't expose hugetlb page to fast gup prematurely Yu Zhao
2019-09-24 23:24             ` [PATCH v3 3/4] mm: don't expose non-hugetlb " Yu Zhao
2019-09-25  8:25               ` Peter Zijlstra
2019-09-25 22:26                 ` Yu Zhao
2019-09-26 10:20                   ` Kirill A. Shutemov
2019-09-27  3:26                     ` John Hubbard
2019-09-27  5:06                       ` Yu Zhao
2019-10-01 22:31                         ` John Hubbard
2019-10-02  0:00                           ` Yu Zhao [this message]
2019-09-27 12:33                       ` Michal Hocko
2019-09-27 18:31                         ` Yu Zhao
2019-09-27 19:31                         ` John Hubbard
2019-09-29 22:47                           ` John Hubbard
2019-09-30  9:20                           ` Jan Kara
2019-09-30 17:57                             ` John Hubbard
2019-10-01  7:10                               ` Jan Kara
2019-10-01  8:36                                 ` Peter Zijlstra
2019-10-01  8:40                                   ` Jan Kara
2019-10-01 18:43                                 ` John Hubbard
2019-10-02  9:24                                   ` Jan Kara
2019-10-02 17:33                                     ` John Hubbard
2019-09-24 23:24             ` [PATCH v3 4/4] mm: remove unnecessary smp_wmb() in __SetPageUptodate() Yu Zhao
2019-09-24 23:50               ` Matthew Wilcox
2019-09-25 22:03                 ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191002000046.GA60764@google.com \
    --to=yuzhao@google.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=osandov@fb.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thellstrom@vmware.com \
    --cc=vbabka@suse.cz \
    --cc=vpillai@digitalocean.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.