From: "Mika Penttilä" <mpenttil@redhat.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>,
linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org,
David Hildenbrand <david@kernel.org>,
Jason Gunthorpe <jgg@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Balbir Singh <balbirs@nvidia.com>, Zi Yan <ziy@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v9 0/5] Migrate on fault for device pages
Date: Tue, 5 May 2026 21:47:58 +0300 [thread overview]
Message-ID: <433d0729-b141-4f19-a0c3-656f033e8ea1@redhat.com> (raw)
In-Reply-To: <afowaUffi0JrnADf@gsse-cloud1.jf.intel.com>
On 5/5/26 21:01, Matthew Brost wrote:
> On Tue, May 05, 2026 at 10:18:14AM +0300, Mika Penttilä wrote:
>> On 5/5/26 10:09, Alistair Popple wrote:
>>
>>> Thanks for doing this work Mika. I've been meaning to take a look at this series
>>> for a while. I'm currently at LSFMM but will try and take a look this week or
>>> next as it sounds quite useful.
>>>
>>> - Alistair
>> Thanks Alistair and no problem, appreciate your insights whenever you have time.
>>
> It looks like this series is breaking Intel's CI [1]. Looks like
> something in RCU is blowing up:
>
> <4> [212.361418] ------------[ cut here ]------------
> <4> [212.361431] Voluntary context switch within RCU read-side critical section!
> <4> [212.361432] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x82/0x780, CPU#11: kworker/u65:5/2352
> <4> [212.361440] Modules linked in: snd_hda_codec_intelhdmi snd_hda_codec_hdmi mei_lb mei_gsc_proxy mtd_intel_dg mei_gsc xe drm_gpuvm drm_gpusvm_helper drm_buddy gpu_sched drm_ttm_helper ttm drm_suballoc_helper drm_exec drm_display_helper cec rc_core drm_kunit_helpers i2c_algo_bit kunit overlay intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp hid_generic coretemp eeepc_wmi cmdlinepart asus_wmi binfmt_misc sparse_keymap spi_nor mei_hdcp mei_pxp mtd wmi_bmof kvm_intel kvm irqbypass aesni_intel gf128mul r8169 usbhid rapl hid intel_cstate realtek snd_hda_intel phy_package snd_intel_dspcfg intel_pmc_core snd_hda_codec idma64 nls_iso8859_1 pmt_telemetry snd_hda_core video snd_hwdep pmt_discovery snd_pcm i2c_i801 pinctrl_alderlake pmt_class snd_timer i2c_mux intel_pmc_ssram_telemetry acpi_tad acpi_pad mei_me snd i2c_smbus spi_intel_pci soundcore mei spi_intel wmi intel_vsec dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4
> <4> [212.361711] CPU: 11 UID: 0 PID: 2352 Comm: kworker/u65:5 Tainted: G S U 7.1.0-rc2-lgci-xe-xe-pw-165953v1-debug+ #1 PREEMPT(lazy)
> <4> [212.361715] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
> <4> [212.361716] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
> <4> [212.361718] Workqueue: xe_page_fault_work_queue xe_pagefault_queue_work [xe]
> <4> [212.361833] RIP: 0010:rcu_note_context_switch+0x82/0x780
> <4> [212.361838] Code: 45 85 c0 74 0f 65 8b 05 24 84 ab 02 85 c0 0f 84 8d 01 00 00 45 84 ed 75 16 8b 83 bc 08 00 00 85 c0 7e 0c 48 8d 3d de ad 4d 02 <67> 48 0f b9 3a 8b 83 bc 08 00 00 85 c0 7e 0d 80 bb c0 08 00 00 00
> <4> [212.361840] RSP: 0018:ffffc9000186f4a0 EFLAGS: 00010002
> <4> [212.361843] RAX: 0000000000000001 RBX: ffff88810a3a8040 RCX: 0000000000000000
> <4> [212.361845] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff839bcea0
> <4> [212.361846] RBP: ffffc9000186f4e8 R08: 0000000000000001 R09: 0000000000000000
> <4> [212.361848] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88885f1b6a00
> <4> [212.361849] R13: 0000000000000000 R14: ffffffff83248312 R15: ffffc9000186f630
> <4> [212.361851] FS: 0000000000000000(0000) GS:ffff8888db203000(0000) knlGS:0000000000000000
> <4> [212.361853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [212.361854] CR2: 00007fe433b2f088 CR3: 000000000344a000 CR4: 0000000000f52ef0
> <4> [212.361856] PKRU: 55555554
> <4> [212.361858] Call Trace:
> <4> [212.361859] <TASK>
> <4> [212.361862] ? lock_is_held_type+0xa3/0x130
> <4> [212.361868] __schedule+0x103/0x1f70
> <4> [212.361870] ? lock_acquire+0xc4/0x300
> <4> [212.361874] ? find_held_lock+0x31/0x90
> <4> [212.361877] ? schedule+0x10e/0x180
> <4> [212.361880] ? lock_release+0xd0/0x2b0
> <4> [212.361885] schedule+0x3a/0x180
> <4> [212.361888] io_schedule+0x4c/0x80
> <4> [212.361890] ? softleaf_entry_wait_on_locked+0x147/0x2b0
> <4> [212.361894] softleaf_entry_wait_on_locked+0x24f/0x2b0
> <4> [212.361899] ? __pfx_wake_page_function+0x10/0x10
> <4> [212.361904] migration_entry_wait+0xff/0x190
> <4> [212.361909] hmm_vma_handle_pte+0x440/0x790
> <4> [212.361914] hmm_vma_walk_pmd+0x5c8/0x1360
> <4> [212.361918] ? xe_pagefault_queue_work+0x1a9/0x520 [xe]
> <4> [212.362015] walk_pgd_range+0x57f/0xd70
> <4> [212.362017] ? lock_is_held_type+0xa3/0x130
> <4> [212.362028] __walk_page_range+0x8e/0x290
> <4> [212.362034] walk_page_range_mm_unsafe+0x19e/0x270
> <4> [212.362036] ? trace_hardirqs_on+0x22/0xf0
> <4> [212.362043] walk_page_range+0x2a/0x40
> <4> [212.362045] hmm_range_fault+0x94/0x190
> <4> [212.362053] drm_gpusvm_get_pages+0x269/0xa30 [drm_gpusvm_helper]
> <4> [212.362067] drm_gpusvm_range_get_pages+0x2e/0x50 [drm_gpusvm_helper]
> <4> [212.362071] __xe_svm_handle_pagefault+0x3e0/0xef0 [xe]
> <4> [212.362181] ? __lock_acquire+0x43e/0x2790
> <4> [212.362188] ? lock_is_held_type+0xa3/0x130
> <4> [212.362193] ? lock_is_held_type+0xa3/0x130
> <4> [212.362197] ? xe_vm_find_overlapping_vma+0x57/0x1e0 [xe]
> <4> [212.362304] xe_svm_handle_pagefault+0x3d/0xb0 [xe]
> <4> [212.362412] xe_pagefault_queue_work+0x1a9/0x520 [xe]
> <4> [212.362509] process_one_work+0x239/0x740
> <4> [212.362518] worker_thread+0x200/0x3f0
> <4> [212.362521] ? __pfx_worker_thread+0x10/0x10
> <4> [212.362524] kthread+0x10d/0x150
> <4> [212.362527] ? __pfx_kthread+0x10/0x10
> <4> [212.362530] ret_from_fork+0x3bd/0x470
> <4> [212.362533] ? __pfx_kthread+0x10/0x10
> <4> [212.362536] ret_from_fork_asm+0x1a/0x30
> <4> [212.362546] </TASK>
> <4> [212.362547] irq event stamp: 2057044
>
> I’ll be out this Thursday for five weeks, but assuming you can sort this
> part out, I’m fine with the series moving forward. I’ve looked at this
> several times, and it seems sane enough to me.
>
> On our list we also have the Sashiko setup [2], which I’ve found to be
> incredibly helpful for series that do deep MM work. I’m not sure why
> Sashiko is saying this series didn’t apply, since it applied cleanly to
> our CI branches. If you can get Sashiko to run on it, that might be
> helpful as well.
>
> Matt
Yes there seemed to be a missing pte_unmap() before migration_entry_wait()... fixed and sent v10.
--Mika
>
> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-165953v1/shard-bmg-4/igt@xe_exec_system_allocator@process-many-stride-mmap-race-nomemset.html
> [2] https://sashiko.dev/#/patchset/20260505051658.2219537-1-mpenttil%40redhat.com
>
>
prev parent reply other threads:[~2026-05-05 18:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 5:16 [PATCH v9 0/5] Migrate on fault for device pages mpenttil
2026-05-05 5:16 ` [PATCH v9 1/5] mm/Kconfig: changes for migrate " mpenttil
2026-05-05 5:16 ` [PATCH v9 2/5] mm: Add helper to convert HMM pfn to migrate pfn mpenttil
2026-05-05 5:16 ` [PATCH v9 3/5] mm/hmm: do the plumbing for HMM to participate in migration mpenttil
2026-05-05 5:16 ` [PATCH v9 4/5] mm: setup device page migration in HMM pagewalk mpenttil
2026-05-05 5:16 ` [PATCH v9 5/5] lib/test_hmm:: add a new testcase for the migrate on fault mpenttil
2026-05-05 7:09 ` [PATCH v9 0/5] Migrate on fault for device pages Alistair Popple
2026-05-05 7:18 ` Mika Penttilä
2026-05-05 18:01 ` Matthew Brost
2026-05-05 18:47 ` Mika Penttilä [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=433d0729-b141-4f19-a0c3-656f033e8ea1@redhat.com \
--to=mpenttil@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox