From: Simona Vetter <simona.vetter@ffwll.ch>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
nouveau@lists.freedesktop.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>, "Alex Shi" <alexs@kernel.org>,
"Yanteng Si" <si.yanteng@linux.dev>,
"Karol Herbst" <kherbst@redhat.com>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Jann Horn" <jannh@google.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Peter Xu" <peterx@redhat.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [PATCH v1 12/12] mm/rmap: keep mapcount untouched for device-exclusive entries
Date: Thu, 30 Jan 2025 11:37:22 +0100 [thread overview]
Message-ID: <Z5tWYpwpUfgEmeKj@phenom.ffwll.local> (raw)
In-Reply-To: <20250129115411.2077152-13-david@redhat.com>
On Wed, Jan 29, 2025 at 12:54:10PM +0100, David Hildenbrand wrote:
> Now that conversion to device-exclusive does no longer perform an
> rmap walk and the main page_vma_mapped_walk() users were taught to
> properly handle nonswap entries, let's treat device-exclusive entries just
> as if they would be present, similar to how we handle device-private
> entries already.
So the reason for handling device-private entries in rmap is so that
drivers can rely on try_to_migrate and related code to invalidate all the
various ptes even for device private memory. Otherwise no one should hit
this path, at least if my understanding is correct.
So I'm very much worried about opening a can of worms here because I think
this adds a genuine new case to all the various callers.
> This fixes swapout/migration of folios with device-exclusive entries.
>
> Likely there are still some page_vma_mapped_walk() callers that are not
> fully prepared for these entries, and where we simply want to refuse
> !pte_present() entries. They have to be fixed independently; the ones in
> mm/rmap.c are prepared.
The other worry is that maybe breaking migration is a feature, at least in
parts. If thp constantly reassembles a pmd entry because hey all the
memory is contig and userspace allocated a chunk of memory to place
atomics that alternate between cpu and gpu nicely separated by 4k pages,
then we'll thrash around invalidating ptes to no end. So might be more
fallout here.
-Sima
>
> Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/memory.c | 17 +----------------
> mm/rmap.c | 7 -------
> 2 files changed, 1 insertion(+), 23 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index db38d6ae4e74..cd689cd8a7c8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -743,20 +743,6 @@ static void restore_exclusive_pte(struct vm_area_struct *vma,
>
> VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) &&
> PageAnonExclusive(page)), folio);
> -
> - /*
> - * No need to take a page reference as one was already
> - * created when the swap entry was made.
> - */
> - if (folio_test_anon(folio))
> - folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE);
> - else
> - /*
> - * Currently device exclusive access only supports anonymous
> - * memory so the entry shouldn't point to a filebacked page.
> - */
> - WARN_ON_ONCE(1);
> -
> set_pte_at(vma->vm_mm, address, ptep, pte);
>
> /*
> @@ -1628,8 +1614,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb,
> */
> WARN_ON_ONCE(!vma_is_anonymous(vma));
> rss[mm_counter(folio)]--;
> - if (is_device_private_entry(entry))
> - folio_remove_rmap_pte(folio, page, vma);
> + folio_remove_rmap_pte(folio, page, vma);
> folio_put(folio);
> } else if (!non_swap_entry(entry)) {
> /* Genuine swap entries, hence a private anon pages */
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9e2002d97d6f..4acc9f6d743a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2495,13 +2495,6 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr,
> /* The pte is writable, uffd-wp does not apply. */
> set_pte_at(mm, addr, fw.ptep, swp_pte);
>
> - /*
> - * TODO: The device-exclusive non-swap PTE holds a folio reference but
> - * does not count as a mapping (mapcount), which is wrong and must be
> - * fixed, otherwise RMAP walks don't behave as expected.
> - */
> - folio_remove_rmap_pte(folio, page, vma);
> -
> folio_walk_end(&fw, vma);
> *foliop = folio;
> return page;
> --
> 2.48.1
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2025-01-30 10:37 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-29 11:53 [PATCH v1 00/12] mm: fixes for device-exclusive entries (hmm) David Hildenbrand
2025-01-29 11:53 ` [PATCH v1 01/12] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs David Hildenbrand
2025-01-29 21:42 ` John Hubbard
2025-01-30 8:56 ` David Hildenbrand
2025-01-30 5:46 ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 02/12] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() David Hildenbrand
2025-01-30 5:47 ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 03/12] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() David Hildenbrand
2025-01-30 5:57 ` Alistair Popple
2025-01-30 9:04 ` David Hildenbrand
2025-01-31 0:28 ` Alistair Popple
2025-01-31 9:29 ` David Hildenbrand
2025-01-30 13:46 ` Simona Vetter
2025-01-30 15:56 ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk David Hildenbrand
2025-01-30 6:11 ` Alistair Popple
2025-01-30 9:01 ` David Hildenbrand
2025-01-30 9:12 ` David Hildenbrand
2025-01-30 9:24 ` David Hildenbrand
2025-01-30 22:31 ` Alistair Popple
2025-02-04 10:56 ` David Hildenbrand
2025-01-30 9:40 ` Simona Vetter
2025-01-30 9:47 ` David Hildenbrand
2025-01-30 13:00 ` Simona Vetter
2025-01-30 15:59 ` David Hildenbrand
2025-01-31 17:00 ` Simona Vetter
2025-01-29 11:54 ` [PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() David Hildenbrand
2025-01-30 9:51 ` Simona Vetter
2025-01-30 9:58 ` David Hildenbrand
2025-01-30 13:03 ` Simona Vetter
2025-01-30 23:06 ` Alistair Popple
2025-01-31 10:55 ` David Hildenbrand
2025-01-31 17:05 ` Simona Vetter
2025-02-04 10:58 ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 06/12] mm: use single SWP_DEVICE_EXCLUSIVE entry type David Hildenbrand
2025-01-30 13:43 ` Simona Vetter
2025-01-30 23:28 ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 07/12] mm/page_vma_mapped: device-private entries are not migration entries David Hildenbrand
2025-01-30 23:36 ` Alistair Popple
2025-01-31 11:06 ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 08/12] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() David Hildenbrand
2025-01-30 10:10 ` Simona Vetter
2025-01-30 11:08 ` David Hildenbrand
2025-01-30 13:06 ` Simona Vetter
2025-01-30 14:08 ` Jason Gunthorpe
2025-01-30 16:10 ` Simona Vetter
2025-01-30 15:52 ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 09/12] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 10/12] mm/rmap: handle device-exclusive entries correctly in folio_referenced_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 11/12] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 12/12] mm/rmap: keep mapcount untouched for device-exclusive entries David Hildenbrand
2025-01-30 10:37 ` Simona Vetter [this message]
2025-01-30 11:42 ` David Hildenbrand
2025-01-30 13:19 ` Simona Vetter
2025-01-30 15:43 ` David Hildenbrand
2025-01-31 17:13 ` Simona Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5tWYpwpUfgEmeKj@phenom.ffwll.local \
--to=simona.vetter@ffwll.ch \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexs@kernel.org \
--cc=apopple@nvidia.com \
--cc=corbet@lwn.net \
--cc=dakr@kernel.org \
--cc=david@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=kherbst@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=nouveau@lists.freedesktop.org \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=si.yanteng@linux.dev \
--cc=simona@ffwll.ch \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox