From: David Hildenbrand <david@redhat.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
nouveau@lists.freedesktop.org,
linux-trace-kernel@vger.kernel.org,
linux-perf-users@vger.kernel.org, damon@lists.linux.dev,
"Andrew Morton" <akpm@linux-foundation.org>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>, "Alex Shi" <alexs@kernel.org>,
"Yanteng Si" <si.yanteng@linux.dev>,
"Karol Herbst" <kherbst@redhat.com>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Oleg Nesterov" <oleg@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"SeongJae Park" <sj@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Jann Horn" <jannh@google.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Peter Xu" <peterx@redhat.com>,
"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [PATCH v2 00/17] mm: fixes for device-exclusive entries (hmm)
Date: Thu, 13 Feb 2025 12:15:58 +0100 [thread overview]
Message-ID: <039b2e48-1d7c-48dc-b832-24db12af216a@redhat.com> (raw)
In-Reply-To: <6sejv2hauce3il5lq6sw53xmjjjglxkhz5copm62oryga6jioi@u66wl2nc3hoy>
On 13.02.25 12:03, Alistair Popple wrote:
> On Mon, Feb 10, 2025 at 08:37:42PM +0100, David Hildenbrand wrote:
>> Against mm-hotfixes-stable for now.
>>
>> Discussing the PageTail() call in make_device_exclusive_range() with
>> Willy, I recently discovered [1] that device-exclusive handling does
>> not properly work with THP, making the hmm-tests selftests fail if THPs
>> are enabled on the system.
>>
>> Looking into more details, I found that hugetlb is not properly fenced,
>> and I realized that something that was bugging me for longer -- how
>> device-exclusive entries interact with mapcounts -- completely breaks
>> migration/swapout/split/hwpoison handling of these folios while they have
>> device-exclusive PTEs.
>>
>> The program below can be used to allocate 1 GiB worth of pages and
>> making them device-exclusive on a kernel with CONFIG_TEST_HMM.
>>
>> Once they are device-exclusive, these folios cannot get swapped out
>> (proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how
>> much one forces memory reclaim), and when having a memory block onlined
>> to ZONE_MOVABLE, trying to offline it will loop forever and complain about
>> failed migration of a page that should be movable.
>>
>> # echo offline > /sys/devices/system/memory/memory136/state
>> # echo online_movable > /sys/devices/system/memory/memory136/state
>> # ./hmm-swap &
>> ... wait until everything is device-exclusive
>> # echo offline > /sys/devices/system/memory/memory136/state
>> [ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
>> index:0x7f20671f7 pfn:0x442b6a
>> [ 285.196618][T14882] memcg:ffff888179298000
>> [ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
>> dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
>> [ 285.201734][T14882] raw: ...
>> [ 285.204464][T14882] raw: ...
>> [ 285.207196][T14882] page dumped because: migration failure
>> [ 285.209072][T14882] page_owner tracks the page as allocated
>> [ 285.210915][T14882] page last allocated via order 0, migratetype
>> Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
>> id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
>> [ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
>> [ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
>> [ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
>> [ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
>> [ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
>> [ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
>> [ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
>> [ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
>> ...
>>
>> This series fixes all issues I found so far. There is no easy way to fix
>> without a bigger rework/cleanup. I have a bunch of cleanups on top (some
>> previous sent, some the result of the discussion in v1) that I will send
>> out separately once this landed and I get to it.
>> I wish we could just use some special present PROT_NONE PTEs instead of
>
> First off David thanks for finding and fixing these issues. If you have further
> clean-ups in mind that you need help with please let me know as I'd be happy
> to help.
Sure! I have some cleanups TBD as result of the previous discussion, but
nothing bigger so far.
(removing the folio lock could be considered bigger, if we want to go
down that path)
>
>> these (non-present, non-none) fake-swap entries; but that just results in
>> the same problem we keep having (lack of spare PTE bits), and staring at
>> other similar fake-swap entries, that ship has sailed.
>>
>> With this series, make_device_exclusive() doesn't actually belong into
>> mm/rmap.c anymore, but I'll leave moving that for another day.
>>
>> I only tested this series with the hmm-tests selftests due to lack of HW,
>> so I'd appreciate some testing, especially if the interaction between
>> two GPUs wanting a device-exclusive entry works as expected.
>
> I'm still reviewing the series but so far testing on my single GPU system
> appears to be working as expected. I will try and fire up a dual GPU system
> tomorrow and test it there as well.
Great, thanks a bunch for testing!
Out of interest: does the nvidia driver make use of this interface as
well, and are you testing with that or with the nouveau driver? I saw
some reports that nvidia at least checks for it [1] when building the
module:
CONFTEST: make_device_exclusive_range
[1]
https://www.googlecloudcommunity.com/gc/AI-ML/Can-t-Install-Nvidia-Drivers-on-6-1-0-18-Kernel/m-p/722596
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-02-13 11:16 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-10 19:37 [PATCH v2 00/17] mm: fixes for device-exclusive entries (hmm) David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 01/17] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 02/17] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 03/17] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() David Hildenbrand
2025-02-11 5:00 ` Andrew Morton
2025-02-11 8:33 ` David Hildenbrand
2025-02-17 0:01 ` Alistair Popple
2025-02-17 9:32 ` David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 04/17] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 05/17] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 06/17] mm: use single SWP_DEVICE_EXCLUSIVE entry type David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 07/17] mm/page_vma_mapped: device-exclusive entries are not migration entries David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 08/17] kernel/events/uprobes: handle device-exclusive entries correctly in __replace_page() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 09/17] mm/ksm: handle device-exclusive entries correctly in write_protect_page() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 10/17] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 11/17] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 12/17] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 13/17] mm/page_idle: handle device-exclusive entries correctly in page_idle_clear_pte_refs_one() David Hildenbrand
2025-02-11 20:48 ` SeongJae Park
2025-02-10 19:37 ` [PATCH v2 14/17] mm/damon: handle device-exclusive entries correctly in damon_folio_young_one() David Hildenbrand
2025-02-11 6:59 ` SeongJae Park
2025-02-10 19:37 ` [PATCH v2 15/17] mm/damon: handle device-exclusive entries correctly in damon_folio_mkold_one() David Hildenbrand
2025-02-11 7:00 ` SeongJae Park
2025-02-10 19:37 ` [PATCH v2 16/17] mm/rmap: keep mapcount untouched for device-exclusive entries David Hildenbrand
2025-02-10 19:37 ` [PATCH v2 17/17] mm/rmap: avoid -EBUSY from make_device_exclusive() David Hildenbrand
2025-02-10 23:05 ` [PATCH v2 00/17] mm: fixes for device-exclusive entries (hmm) Andrew Morton
2025-02-10 23:39 ` Barry Song
2025-02-13 11:03 ` Alistair Popple
2025-02-13 11:15 ` David Hildenbrand [this message]
2025-02-14 1:25 ` Alistair Popple
2025-02-14 10:37 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=039b2e48-1d7c-48dc-b832-24db12af216a@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexs@kernel.org \
--cc=apopple@nvidia.com \
--cc=corbet@lwn.net \
--cc=dakr@kernel.org \
--cc=damon@lists.linux.dev \
--cc=dri-devel@lists.freedesktop.org \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=kherbst@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=mhiramat@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=oleg@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=si.yanteng@linux.dev \
--cc=simona@ffwll.ch \
--cc=sj@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).