From: Nicholas Piggin <npiggin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michal Hocko <mhocko@suse.com>,
"kirill.shutemov" <kirill.shutemov@linux.intel.com>,
Minchan Kim <minchan@kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
kernel-team <kernel-team@fb.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Nadav Amit <nadav.amit@gmail.com>,
linux-arch <linux-arch@vger.kernel.org>
Subject: Re: mm,tlb: revert 4647706ebeee?
Date: Tue, 10 Jul 2018 15:04:10 +1000 [thread overview]
Message-ID: <20180710150410.4207bbfa@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20180709171356.87d834e125f06e0cdaa72f85@linux-foundation.org>
On Mon, 9 Jul 2018 17:13:56 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 8 Jul 2018 01:25:38 +1000 Nicholas Piggin <npiggin@gmail.com> wrote:
>
> > On Fri, 06 Jul 2018 13:03:55 -0400
> > Rik van Riel <riel@surriel.com> wrote:
> >
> > > Hello,
> > >
> > > It looks like last summer, there were 2 sets of patches
> > > in flight to fix the issue of simultaneous mprotect/madvise
> > > calls unmapping PTEs, and some pages not being flushed from
> > > the TLB before returning to userspace.
> > >
> > > Minchan posted these patches:
> > > 56236a59556c ("mm: refactor TLB gathering API")
> > > 99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem")
> > >
> > > Around the same time, Mel posted:
> > > 4647706ebeee ("mm: always flush VMA ranges affected by zap_page_range")
> > >
> > > They both appear to solve the same bug.
> > >
> > > Only one of the two solutions is needed.
> > >
> > > However, 4647706ebeee appears to introduce extra TLB
> > > flushes - one per VMA, instead of one over the entire
> > > range unmapped, and also extra flushes when there are
> > > no simultaneous unmappers of the same mm.
> > >
> > > For that reason, it seems like we should revert
> > > 4647706ebeee and keep only Minchan's solution in
> > > the kernel.
> > >
> > > Am I overlooking any reason why we should not revert
> > > 4647706ebeee?
> >
> > Yes I think so. Discussed here recently:
> >
> > https://marc.info/?l=linux-mm&m=152878780528037&w=2
>
> Unclear if that was an ack ;)
>
Sure, I'm thinking Rik's mail is a ack for my patch :)
No actually I think it's okay, but was in the middle of testing
my series when Aneesh pointed out a bit was missing from powerpc,
so I had to go off and fix that, I think that's upstream now. So
need to go back and re-test this revert.
Wouldn't hurt for other arch maintainers to have a look I guess
(cc linux-arch):
The problem powerpc had is that mmu_gather flushing will flush a
single page size based on the ptes it encounters when we zap. If
we hit a different page size, it flushes and switches to the new
size. If we have concurrent zaps on the same range, the other
thread may have cleared a large page pte so we won't see that and
will only do a small page flush for that range. Which means we can
return before the other thread invalidated our TLB for the large
pages in the range we wanted to flush.
I suspect most arches are probably okay, but if you make any TLB
flush choices based on the pte contents, then you could be exposed.
Except in the case of archs like sparc and powerpc/hash which do
the flushing in arch_leave_lazy_mmu_mode(), because that is called
under the same page table lock, so there can't be concurrent zap.
A quick look through the archs doesn't show anything obvious, but
please take a look at your arch.
And I'll try to do a bit more testing.
Thanks,
Nick
prev parent reply other threads:[~2018-07-10 5:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-06 17:03 mm,tlb: revert 4647706ebeee? Rik van Riel
2018-07-06 17:10 ` [PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range" Rik van Riel
2018-07-16 13:12 ` Mel Gorman
2018-07-07 15:25 ` mm,tlb: revert 4647706ebeee? Nicholas Piggin
2018-07-10 0:13 ` Andrew Morton
2018-07-10 5:04 ` Nicholas Piggin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710150410.4207bbfa@roar.ozlabs.ibm.com \
--to=npiggin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=kernel-team@fb.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).