From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH mm-unstable v1] mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
Date: Fri, 21 Apr 2023 10:30:42 +1000 [thread overview]
Message-ID: <87v8hqdqlb.fsf@nvidia.com> (raw)
In-Reply-To: <20230418142113.439494-1-david@redhat.com>
David Hildenbrand <david@redhat.com> writes:
> Staring at the comment "Recheck VMA as permissions can change since
> migration started" in remove_migration_pte() can result in confusion,
> because if the source PTE/PMD indicates write permissions, then there
> should be no need to check VMA write permissions when restoring migration
> entries or PTE-mapping a PMD.
Thanks David, I have oft wondered about that but not stared at it to the
point of confusion. The change looks correct to me so feel free to add:
Reviewed-by: Alistair Popple <apopple@nvidia.com>
For the mm/migrate.c parts. Also presumably if mprotect(PROT_READ) was a
problem then mprotect(PROT_NONE) would also need some kind of special
handling which I don't see.
> Commit d3cb8bf6081b ("mm: migrate: Close race between migration completion
> and mprotect") introduced the maybe_mkwrite() handling in
> remove_migration_pte() in 2014, stating that a race between mprotect() and
> migration finishing would be possible, and that we could end up with
> a writable PTE that should be readable.
>
> However, mprotect() code first updates vma->vm_flags / vma->vm_page_prot
> and then walks the page tables to (a) set all present writable PTEs to
> read-only and (b) convert all writable migration entries to readable
> migration entries. While walking the page tables and modifying the
> entries, migration code has to grab the PT locks to synchronize against
> concurrent page table modifications.
>
> Assuming migration would find a writable migration entry (while holding
> the PT lock) and replace it with a writable present PTE, surely mprotect()
> code didn't stumble over the writable migration entry yet (converting it
> into a readable migration entry) and would instead wait for the PT lock to
> convert the now present writable PTE into a read-only PTE. As mprotect()
> didn't finish yet, the behavior is just like migration didn't happen: a
> writable PTE will be converted to a read-only PTE.
>
> So it's fine to rely on the writability information in the source
> PTE/PMD and not recheck against the VMA as long as we're holding the PT
> lock to synchronize with anyone who concurrently wants to downgrade write
> permissions (like mprotect()) by first adjusting vma->vm_flags /
> vma->vm_page_prot to then walk over the page tables to adjust the page
> table entries.
>
> Running test cases that should reveal such races -- mprotect(PROT_READ)
> racing with page migration or THP splitting -- for multiple hours did
> not reveal an issue with this cleanup.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>
> This is a follow-up cleanup to [1]:
> [PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not
> unconditionally allow for write access
>
> I wanted to be a bit careful and write some test cases to convince myself
> that I am not missing something important. Of course, there is still the
> possibility that my test cases are buggy ;)
>
> Test cases I'm running:
> https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_migration.c
> https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_thp_split.c
>
>
> [1] https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
>
> ---
> mm/huge_memory.c | 4 ++--
> mm/migrate.c | 5 +----
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c23fa39dec92..624671aaa60d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> } else {
> entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
> if (write)
> - entry = maybe_mkwrite(entry, vma);
> + entry = pte_mkwrite(entry);
> if (anon_exclusive)
> SetPageAnonExclusive(page + i);
> if (!young)
> @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
> if (pmd_swp_soft_dirty(*pvmw->pmd))
> pmde = pmd_mksoft_dirty(pmde);
> if (is_writable_migration_entry(entry))
> - pmde = maybe_pmd_mkwrite(pmde, vma);
> + pmde = pmd_mkwrite(pmde);
> if (pmd_swp_uffd_wp(*pvmw->pmd))
> pmde = pmd_mkuffd_wp(pmde);
> if (!is_migration_entry_young(entry))
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5d95e09b1618..02cace7955d4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -213,16 +213,13 @@ static bool remove_migration_pte(struct folio *folio,
> if (pte_swp_soft_dirty(*pvmw.pte))
> pte = pte_mksoft_dirty(pte);
>
> - /*
> - * Recheck VMA as permissions can change since migration started
> - */
> entry = pte_to_swp_entry(*pvmw.pte);
> if (!is_migration_entry_young(entry))
> pte = pte_mkold(pte);
> if (folio_test_dirty(folio) && is_migration_entry_dirty(entry))
> pte = pte_mkdirty(pte);
> if (is_writable_migration_entry(entry))
> - pte = maybe_mkwrite(pte, vma);
> + pte = pte_mkwrite(pte);
> else if (pte_swp_uffd_wp(*pvmw.pte))
> pte = pte_mkuffd_wp(pte);
prev parent reply other threads:[~2023-04-21 0:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 14:21 [PATCH mm-unstable v1] mm: don't check VMA write permissions if the PTE/PMD indicates write permissions David Hildenbrand
2023-04-18 15:56 ` Peter Xu
2023-04-18 15:57 ` David Hildenbrand
2023-04-18 19:01 ` Kirill A. Shutemov
2023-04-21 0:30 ` Alistair Popple [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8hqdqlb.fsf@nvidia.com \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.