From: Yu Zhao <yuzhao@google.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Will Deacon <will@kernel.org>,
Laurent Dufour <ldufour@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Vinayak Menon <vinmenon@codeaurora.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>, Peter Xu <peterx@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm <linux-mm@kvack.org>,
lkml <linux-kernel@vger.kernel.org>,
Pavel Emelyanov <xemul@openvz.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
stable <stable@vger.kernel.org>, Minchan Kim <minchan@kernel.org>,
surenb@google.com, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
Date: Sun, 17 Jan 2021 02:16:55 -0700 [thread overview]
Message-ID: <YAQAhyOFqEdjTRPJ@google.com> (raw)
In-Reply-To: <85DAADF4-2537-40BD-8580-A57C201FF5F3@gmail.com>
On Sat, Jan 16, 2021 at 11:32:22PM -0800, Nadav Amit wrote:
> > On Jan 16, 2021, at 8:41 PM, Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 09:43:38PM +0000, Will Deacon wrote:
> >> On Tue, Jan 12, 2021 at 12:38:34PM -0800, Nadav Amit wrote:
> >>>> On Jan 12, 2021, at 11:56 AM, Yu Zhao <yuzhao@google.com> wrote:
> >>>> On Tue, Jan 12, 2021 at 11:15:43AM -0800, Nadav Amit wrote:
> >>>>> I will send an RFC soon for per-table deferred TLB flushes tracking.
> >>>>> The basic idea is to save a generation in the page-struct that tracks
> >>>>> when deferred PTE change took place, and track whenever a TLB flush
> >>>>> completed. In addition, other users - such as mprotect - would use
> >>>>> the tlb_gather interface.
> >>>>>
> >>>>> Unfortunately, due to limited space in page-struct this would only
> >>>>> be possible for 64-bit (and my implementation is only for x86-64).
> >>>>
> >>>> I don't want to discourage you but I don't think this would end up
> >>>> well. PPC doesn't necessarily follow one-page-struct-per-table rule,
> >>>> and I've run into problems with this before while trying to do
> >>>> something similar.
> >>>
> >>> Discourage, discourage. Better now than later.
> >>>
> >>> It will be relatively easy to extend the scheme to be per-VMA instead of
> >>> per-table for architectures that prefer it this way. It does require
> >>> TLB-generation tracking though, which Andy only implemented for x86, so I
> >>> will focus on x86-64 right now.
> >>
> >> Can you remind me of what we're missing on arm64 in this area, please? I'm
> >> happy to help get this up and running once you have something I can build
> >> on.
> >
> > I noticed arm/arm64 don't support ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH.
> > Would it be something worth pursuing? Arm has been using mm_cpumask,
> > so it might not be too difficult I guess?
>
> [ +Mel Gorman who implemented ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH ]
>
> IIUC, there are at least two bugs in x86 implementation.
>
> First, there is a missing memory barrier in tlbbatch_add_mm() between
> inc_mm_tlb_gen() and the read of mm_cpumask().
In arch_tlbbatch_add_mm()? inc_mm_tlb_gen() has builtin barrier as its
comment says -- atomic update ops that return values are also full
memory barriers.
> Second, try_to_unmap_flush() clears flush_required after flushing. Another
> thread can call set_tlb_ubc_flush_pending() after the flush and before
> flush_required is cleared, and the indication that a TLB flush is pending
> can be lost.
This isn't a problem either because flush_required is per thread.
> I am working on addressing these issues among others, but, as you already
> saw, I am a bit slow.
>
> On a different but related topic: Another thing that I noticed that Arm does
> not do is batching TLB flushes across VMAs. Since Arm does not have its own
> tlb_end_vma(), it uses the default tlb_end_vma(), which flushes each VMA
> separately. Peter Zijlstra’s comment says that there are advantages in
> flushing each VMA separately, but I am not sure it is better or intentional
> (especially since x86 does not do so).
>
> I am trying to remove the arch-specific tlb_end_vma() and have a config
> option to control this behavior.
One thing worth noting is not all arm/arm64 hw versions support ranges.
(system_supports_tlb_range()). But IIUC what you are trying to do, this
isn't a problem.
> Again, sorry for being slow. I hope to send an RFC soon.
No worries. I brought it up only because I noticed it and didn't want
it to slip away.
next prev parent reply other threads:[~2021-01-17 9:17 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-19 4:30 [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Nadav Amit
2020-12-19 19:15 ` Andrea Arcangeli
2020-12-19 21:34 ` Nadav Amit
2020-12-19 22:06 ` Nadav Amit
2020-12-20 2:20 ` Andrea Arcangeli
2020-12-21 4:36 ` Nadav Amit
2020-12-21 5:12 ` Yu Zhao
2020-12-21 5:25 ` Nadav Amit
2020-12-21 5:39 ` Nadav Amit
2020-12-21 7:29 ` Yu Zhao
2020-12-22 20:34 ` Andy Lutomirski
2020-12-22 20:58 ` Nadav Amit
2020-12-22 21:34 ` Andrea Arcangeli
2020-12-20 2:01 ` Andy Lutomirski
2020-12-20 2:49 ` Andrea Arcangeli
2020-12-20 5:08 ` Andy Lutomirski
2020-12-21 18:03 ` Andrea Arcangeli
2020-12-21 18:22 ` Andy Lutomirski
2020-12-20 6:05 ` Yu Zhao
2020-12-20 8:06 ` Nadav Amit
2020-12-20 9:54 ` Yu Zhao
2020-12-21 3:33 ` Nadav Amit
2020-12-21 4:44 ` Yu Zhao
2020-12-21 17:27 ` Peter Xu
2020-12-21 18:31 ` Nadav Amit
2020-12-21 19:16 ` Yu Zhao
2020-12-21 19:55 ` Linus Torvalds
2020-12-21 20:21 ` Yu Zhao
2020-12-21 20:25 ` Linus Torvalds
2020-12-21 20:23 ` Nadav Amit
2020-12-21 20:26 ` Linus Torvalds
2020-12-21 21:24 ` Yu Zhao
2020-12-21 21:49 ` Nadav Amit
2020-12-21 22:30 ` Peter Xu
2020-12-21 22:55 ` Nadav Amit
2020-12-21 23:30 ` Linus Torvalds
2020-12-21 23:46 ` Nadav Amit
2020-12-22 19:44 ` Andrea Arcangeli
2020-12-22 20:19 ` Nadav Amit
2020-12-22 21:17 ` Andrea Arcangeli
2020-12-21 23:12 ` Yu Zhao
2020-12-21 23:33 ` Linus Torvalds
2020-12-22 0:00 ` Yu Zhao
2020-12-22 0:11 ` Linus Torvalds
2020-12-22 0:24 ` Yu Zhao
2020-12-21 23:22 ` Linus Torvalds
2020-12-22 3:19 ` Andy Lutomirski
2020-12-22 4:16 ` Linus Torvalds
2020-12-22 20:19 ` Andy Lutomirski
2021-01-05 15:37 ` Peter Zijlstra
2021-01-05 18:03 ` Andrea Arcangeli
2021-01-12 16:20 ` Peter Zijlstra
2021-01-12 11:43 ` Vinayak Menon
2021-01-12 15:47 ` Laurent Dufour
2021-01-12 16:57 ` Peter Zijlstra
2021-01-12 19:02 ` Laurent Dufour
2021-01-12 19:15 ` Nadav Amit
2021-01-12 19:56 ` Yu Zhao
2021-01-12 20:38 ` Nadav Amit
2021-01-12 20:49 ` Yu Zhao
2021-01-12 21:43 ` Will Deacon
2021-01-12 22:29 ` Nadav Amit
2021-01-12 22:46 ` Will Deacon
2021-01-13 0:31 ` Andy Lutomirski
2021-01-17 4:41 ` Yu Zhao
2021-01-17 7:32 ` Nadav Amit
2021-01-17 9:16 ` Yu Zhao [this message]
2021-01-17 10:13 ` Nadav Amit
2021-01-17 19:25 ` Yu Zhao
2021-01-18 2:49 ` Nadav Amit
2020-12-22 9:38 ` Nadav Amit
2020-12-22 19:31 ` Andrea Arcangeli
2020-12-22 20:15 ` Matthew Wilcox
2020-12-22 20:26 ` Andrea Arcangeli
2020-12-22 21:14 ` Yu Zhao
2020-12-22 22:02 ` Andrea Arcangeli
2020-12-22 23:39 ` Yu Zhao
2020-12-22 23:50 ` Linus Torvalds
2020-12-23 0:01 ` Linus Torvalds
2020-12-23 0:23 ` Yu Zhao
2020-12-23 2:17 ` Andrea Arcangeli
2020-12-23 9:44 ` Linus Torvalds
2020-12-23 10:06 ` Yu Zhao
2020-12-23 16:24 ` Peter Xu
2020-12-23 18:51 ` Andrea Arcangeli
2020-12-23 18:55 ` Andrea Arcangeli
2020-12-23 19:12 ` Yu Zhao
2020-12-23 19:32 ` Peter Xu
2020-12-23 0:20 ` Linus Torvalds
2020-12-23 2:56 ` Andrea Arcangeli
2020-12-23 3:36 ` Yu Zhao
2020-12-23 15:52 ` Peter Xu
2020-12-23 21:07 ` Andrea Arcangeli
2020-12-23 21:39 ` Andrea Arcangeli
2020-12-23 22:29 ` Yu Zhao
2020-12-23 23:04 ` Andrea Arcangeli
2020-12-24 1:21 ` Andy Lutomirski
2020-12-24 2:00 ` Andrea Arcangeli
2020-12-24 3:09 ` Nadav Amit
2020-12-24 3:30 ` Nadav Amit
2020-12-24 3:34 ` Yu Zhao
2020-12-24 4:01 ` Andrea Arcangeli
2020-12-24 5:18 ` Nadav Amit
2020-12-24 18:49 ` Andrea Arcangeli
2020-12-24 19:16 ` Andrea Arcangeli
2020-12-24 4:37 ` Nadav Amit
2020-12-24 3:31 ` Andrea Arcangeli
2020-12-23 23:39 ` Linus Torvalds
2020-12-24 1:01 ` Andrea Arcangeli
2020-12-22 21:14 ` Nadav Amit
2020-12-22 12:40 ` Nadav Amit
2020-12-22 18:30 ` Yu Zhao
2020-12-22 19:20 ` Nadav Amit
2020-12-23 16:23 ` Will Deacon
2020-12-23 19:04 ` Nadav Amit
2020-12-23 22:05 ` Andrea Arcangeli
2020-12-23 22:45 ` Nadav Amit
2020-12-23 23:55 ` Andrea Arcangeli
2020-12-21 21:55 ` Peter Xu
2020-12-21 23:13 ` Linus Torvalds
2020-12-21 19:53 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YAQAhyOFqEdjTRPJ@google.com \
--to=yuzhao@google.com \
--cc=aarcange@redhat.com \
--cc=ldufour@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mike.kravetz@oracle.com \
--cc=minchan@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vinmenon@codeaurora.org \
--cc=will@kernel.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.