From: Anthony Yznaga <anthony.yznaga@oracle.com>
To: Jann Horn <jannh@google.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
markhemm@googlemail.com, viro@zeniv.linux.org.uk,
david@redhat.com, khalid@kernel.org, andreyknvl@gmail.com,
dave.hansen@intel.com, luto@kernel.org, brauner@kernel.org,
arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org,
vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com,
neilb@suse.de, maz@kernel.org
Subject: Re: [PATCH v2 08/20] mm/mshare: flush all TLBs when updating PTEs in an mshare range
Date: Fri, 30 May 2025 15:47:08 -0700 [thread overview]
Message-ID: <a66997b6-60e4-4bfc-9437-89924c2ed3aa@oracle.com> (raw)
In-Reply-To: <CAG48ez3TTicKSxXyScmqq5Gg91+-KCSk80EccwkbvsQjLzjCFA@mail.gmail.com>
On 5/30/25 10:46 AM, Jann Horn wrote:
> On Fri, May 30, 2025 at 6:30 PM Anthony Yznaga
> <anthony.yznaga@oracle.com> wrote:
>> On 5/30/25 7:41 AM, Jann Horn wrote:
>>> On Fri, Apr 4, 2025 at 4:18 AM Anthony Yznaga <anthony.yznaga@oracle.com> wrote:
>>>> Unlike the mm of a task, an mshare host mm is not updated on context
>>>> switch. In particular this means that mm_cpumask is never updated
>>>> which results in TLB flushes for updates to mshare PTEs only being
>>>> done on the local CPU. To ensure entries are flushed for non-local
>>>> TLBs, set up an mmu notifier on the mshare mm and use the
>>>> .arch_invalidate_secondary_tlbs callback to flush all TLBs.
>>>> arch_invalidate_secondary_tlbs guarantees that TLB entries will be
>>>> flushed before pages are freed when unmapping pages in an mshare region.
>>>
>>> Thanks for working on this, I think this is a really nice feature.
>>>
>>> An issue that I think this series doesn't address is:
>>> There could be mmu_notifiers (for things like KVM or SVA IOMMU) that
>>> want to be notified on changes to an mshare VMA; if those are not
>>> invoked, we could get UAF of page contents. So either we propagate MMU
>>> notifier invocations in the host mm into the mshare regions that use
>>> it, or we'd have to somehow prevent a process from using MMU notifiers
>>> and mshare at the same time.
>>
>> Thanks, Jann. I've noted this as an issue. Ultimately I think the
>> notifiers calls will need to be propagated. It's going to be tricky, but
>> I have some ideas.
>
> Very naively I think you could basically register your own notifier on
> the host mm that has notifier callbacks vaguely like this that walk
> the rmap of the mshare file and invoke nested mmu notifiers on each
> VMA that maps the file, basically like unmap_mapping_pages() except
> that you replace unmap_mapping_range_vma() with a notifier invocation?
>
> static int mshare_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> const struct mmu_notifier_range *range)
> {
> struct vm_area_struct *vma;
> pgoff_t first_index, last_index;
>
> if (range->end < host_mm->mmap_base)
> return 0;
> first_index = (max(range->start, host_mm->mmap_base) -
> host_mm->mmap_base) / PAGE_SIZE;
> last_index = (range->end - host_mm->mmap_base) / PAGE_SIZE;
> i_mmap_lock_read(mapping);
> vma_interval_tree_foreach(vma, &mapping->i_mmap, first_index, last_index) {
> struct mmu_notifier_range nested_range;
>
> [... same math as in unmap_mapping_range_tree ...]
> mmu_notifier_range_init(&nested_range, range->event, vma->vm_mm,
> nested_start, nested_end);
> mmu_notifier_invalidate_range_start(&nested_range);
> }
> i_mmap_unlock_read(mapping);
> }
>
> And ensure that when mm_take_all_locks() encounters an mshare VMA, it
> basically recursively does mm_take_all_locks() on the mshare host mm?
>
> I think that might be enough to make it work, and the rest beyond that
> would be optimizations?
I figured the vma interval tree would need to be walked. I hadn't
considered mm_take_all_locks(), though. This is definitely a good
starting point. Thanks for this!
next prev parent reply other threads:[~2025-05-30 22:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 2:18 [PATCH v2 00/20] Add support for shared PTEs across processes Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 01/20] mm: Add msharefs filesystem Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 02/20] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 03/20] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 04/20] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 05/20] mm/mshare: add ways to set the size of an mshare region Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 06/20] mm/mshare: Add a vma flag to indicate " Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 07/20] mm/mshare: Add mmap support Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 08/20] mm/mshare: flush all TLBs when updating PTEs in an mshare range Anthony Yznaga
2025-05-30 14:41 ` Jann Horn
2025-05-30 16:29 ` Anthony Yznaga
2025-05-30 17:46 ` Jann Horn
2025-05-30 22:47 ` Anthony Yznaga [this message]
2025-04-04 2:18 ` [PATCH v2 09/20] sched/numa: do not scan msharefs vmas Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 10/20] mm: add mmap_read_lock_killable_nested() Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 11/20] mm: add and use unmap_page_range vm_ops hook Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 12/20] mm/mshare: prepare for page table sharing support Anthony Yznaga
2025-05-30 14:56 ` Jann Horn
2025-05-30 16:41 ` Anthony Yznaga
2025-06-02 15:26 ` Jann Horn
2025-06-02 22:02 ` Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 13/20] x86/mm: enable page table sharing Anthony Yznaga
2025-08-12 13:46 ` Yongting Lin
2025-08-12 17:12 ` Anthony Yznaga
2025-08-18 9:44 ` Yongting Lin
2025-08-20 1:32 ` Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 14/20] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 15/20] mm: pass the mm in vma_munmap_struct Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 16/20] mm/mshare: Add an ioctl for mapping objects in an mshare region Anthony Yznaga
2025-04-04 2:18 ` [PATCH v2 17/20] mm/mshare: Add an ioctl for unmapping " Anthony Yznaga
2025-04-04 2:19 ` [PATCH v2 18/20] mm/mshare: provide a way to identify an mm as an mshare host mm Anthony Yznaga
2025-04-04 2:19 ` [PATCH v2 19/20] mm/mshare: get memcg from current->mm instead of mshare mm Anthony Yznaga
2025-04-04 2:19 ` [PATCH v2 20/20] mm/mshare: associate a mem cgroup with an mshare file Anthony Yznaga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a66997b6-60e4-4bfc-9437-89924c2ed3aa@oracle.com \
--to=anthony.yznaga@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ebiederm@xmission.com \
--cc=jannh@google.com \
--cc=khalid@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=markhemm@googlemail.com \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=neilb@suse.de \
--cc=pcc@google.com \
--cc=rostedt@goodmis.org \
--cc=vasily.averin@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).