Re: [PATCH v9 0/5] Migrate on fault for device pages

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Mika Penttilä" <mpenttil@redhat.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>,
	linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@kernel.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v9 0/5] Migrate on fault for device pages
Date: Tue, 5 May 2026 21:47:58 +0300	[thread overview]
Message-ID: <433d0729-b141-4f19-a0c3-656f033e8ea1@redhat.com> (raw)
In-Reply-To: <afowaUffi0JrnADf@gsse-cloud1.jf.intel.com>


On 5/5/26 21:01, Matthew Brost wrote:

> On Tue, May 05, 2026 at 10:18:14AM +0300, Mika Penttilä wrote:
>> On 5/5/26 10:09, Alistair Popple wrote:
>>
>>> Thanks for doing this work Mika. I've been meaning to take a look at this series
>>> for a while. I'm currently at LSFMM but will try and take a look this week or
>>> next as it sounds quite useful.
>>>
>>>  - Alistair
>> Thanks Alistair and no problem, appreciate your insights whenever you have time.
>>
> It looks like this series is breaking Intel's CI [1]. Looks like
> something in RCU is blowing up:
>
> <4> [212.361418] ------------[ cut here ]------------
> <4> [212.361431] Voluntary context switch within RCU read-side critical section!
> <4> [212.361432] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x82/0x780, CPU#11: kworker/u65:5/2352
> <4> [212.361440] Modules linked in: snd_hda_codec_intelhdmi snd_hda_codec_hdmi mei_lb mei_gsc_proxy mtd_intel_dg mei_gsc xe drm_gpuvm drm_gpusvm_helper drm_buddy gpu_sched drm_ttm_helper ttm drm_suballoc_helper drm_exec drm_display_helper cec rc_core drm_kunit_helpers i2c_algo_bit kunit overlay intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp hid_generic coretemp eeepc_wmi cmdlinepart asus_wmi binfmt_misc sparse_keymap spi_nor mei_hdcp mei_pxp mtd wmi_bmof kvm_intel kvm irqbypass aesni_intel gf128mul r8169 usbhid rapl hid intel_cstate realtek snd_hda_intel phy_package snd_intel_dspcfg intel_pmc_core snd_hda_codec idma64 nls_iso8859_1 pmt_telemetry snd_hda_core video snd_hwdep pmt_discovery snd_pcm i2c_i801 pinctrl_alderlake pmt_class snd_timer i2c_mux intel_pmc_ssram_telemetry acpi_tad acpi_pad mei_me snd i2c_smbus spi_intel_pci soundcore mei spi_intel wmi intel_vsec dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4
> <4> [212.361711] CPU: 11 UID: 0 PID: 2352 Comm: kworker/u65:5 Tainted: G S   U              7.1.0-rc2-lgci-xe-xe-pw-165953v1-debug+ #1 PREEMPT(lazy) 
> <4> [212.361715] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
> <4> [212.361716] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
> <4> [212.361718] Workqueue: xe_page_fault_work_queue xe_pagefault_queue_work [xe]
> <4> [212.361833] RIP: 0010:rcu_note_context_switch+0x82/0x780
> <4> [212.361838] Code: 45 85 c0 74 0f 65 8b 05 24 84 ab 02 85 c0 0f 84 8d 01 00 00 45 84 ed 75 16 8b 83 bc 08 00 00 85 c0 7e 0c 48 8d 3d de ad 4d 02 <67> 48 0f b9 3a 8b 83 bc 08 00 00 85 c0 7e 0d 80 bb c0 08 00 00 00
> <4> [212.361840] RSP: 0018:ffffc9000186f4a0 EFLAGS: 00010002
> <4> [212.361843] RAX: 0000000000000001 RBX: ffff88810a3a8040 RCX: 0000000000000000
> <4> [212.361845] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff839bcea0
> <4> [212.361846] RBP: ffffc9000186f4e8 R08: 0000000000000001 R09: 0000000000000000
> <4> [212.361848] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88885f1b6a00
> <4> [212.361849] R13: 0000000000000000 R14: ffffffff83248312 R15: ffffc9000186f630
> <4> [212.361851] FS:  0000000000000000(0000) GS:ffff8888db203000(0000) knlGS:0000000000000000
> <4> [212.361853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [212.361854] CR2: 00007fe433b2f088 CR3: 000000000344a000 CR4: 0000000000f52ef0
> <4> [212.361856] PKRU: 55555554
> <4> [212.361858] Call Trace:
> <4> [212.361859]  <TASK>
> <4> [212.361862]  ? lock_is_held_type+0xa3/0x130
> <4> [212.361868]  __schedule+0x103/0x1f70
> <4> [212.361870]  ? lock_acquire+0xc4/0x300
> <4> [212.361874]  ? find_held_lock+0x31/0x90
> <4> [212.361877]  ? schedule+0x10e/0x180
> <4> [212.361880]  ? lock_release+0xd0/0x2b0
> <4> [212.361885]  schedule+0x3a/0x180
> <4> [212.361888]  io_schedule+0x4c/0x80
> <4> [212.361890]  ? softleaf_entry_wait_on_locked+0x147/0x2b0
> <4> [212.361894]  softleaf_entry_wait_on_locked+0x24f/0x2b0
> <4> [212.361899]  ? __pfx_wake_page_function+0x10/0x10
> <4> [212.361904]  migration_entry_wait+0xff/0x190
> <4> [212.361909]  hmm_vma_handle_pte+0x440/0x790
> <4> [212.361914]  hmm_vma_walk_pmd+0x5c8/0x1360
> <4> [212.361918]  ? xe_pagefault_queue_work+0x1a9/0x520 [xe]
> <4> [212.362015]  walk_pgd_range+0x57f/0xd70
> <4> [212.362017]  ? lock_is_held_type+0xa3/0x130
> <4> [212.362028]  __walk_page_range+0x8e/0x290
> <4> [212.362034]  walk_page_range_mm_unsafe+0x19e/0x270
> <4> [212.362036]  ? trace_hardirqs_on+0x22/0xf0
> <4> [212.362043]  walk_page_range+0x2a/0x40
> <4> [212.362045]  hmm_range_fault+0x94/0x190
> <4> [212.362053]  drm_gpusvm_get_pages+0x269/0xa30 [drm_gpusvm_helper]
> <4> [212.362067]  drm_gpusvm_range_get_pages+0x2e/0x50 [drm_gpusvm_helper]
> <4> [212.362071]  __xe_svm_handle_pagefault+0x3e0/0xef0 [xe]
> <4> [212.362181]  ? __lock_acquire+0x43e/0x2790
> <4> [212.362188]  ? lock_is_held_type+0xa3/0x130
> <4> [212.362193]  ? lock_is_held_type+0xa3/0x130
> <4> [212.362197]  ? xe_vm_find_overlapping_vma+0x57/0x1e0 [xe]
> <4> [212.362304]  xe_svm_handle_pagefault+0x3d/0xb0 [xe]
> <4> [212.362412]  xe_pagefault_queue_work+0x1a9/0x520 [xe]
> <4> [212.362509]  process_one_work+0x239/0x740
> <4> [212.362518]  worker_thread+0x200/0x3f0
> <4> [212.362521]  ? __pfx_worker_thread+0x10/0x10
> <4> [212.362524]  kthread+0x10d/0x150
> <4> [212.362527]  ? __pfx_kthread+0x10/0x10
> <4> [212.362530]  ret_from_fork+0x3bd/0x470
> <4> [212.362533]  ? __pfx_kthread+0x10/0x10
> <4> [212.362536]  ret_from_fork_asm+0x1a/0x30
> <4> [212.362546]  </TASK>
> <4> [212.362547] irq event stamp: 2057044
>
> I’ll be out this Thursday for five weeks, but assuming you can sort this
> part out, I’m fine with the series moving forward. I’ve looked at this
> several times, and it seems sane enough to me.
>
> On our list we also have the Sashiko setup [2], which I’ve found to be
> incredibly helpful for series that do deep MM work. I’m not sure why
> Sashiko is saying this series didn’t apply, since it applied cleanly to
> our CI branches. If you can get Sashiko to run on it, that might be
> helpful as well.
>
> Matt

Yes there seemed to be a missing pte_unmap() before migration_entry_wait()... fixed and sent v10.

--Mika


>
> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-165953v1/shard-bmg-4/igt@xe_exec_system_allocator@process-many-stride-mmap-race-nomemset.html
> [2] https://sashiko.dev/#/patchset/20260505051658.2219537-1-mpenttil%40redhat.com
>
>

     prev parent reply	other threads:[~2026-05-05 18:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  5:16 [PATCH v9 0/5] Migrate on fault for device pages mpenttil
2026-05-05  5:16 ` [PATCH v9 1/5] mm/Kconfig: changes for migrate " mpenttil
2026-05-05  5:16 ` [PATCH v9 2/5] mm: Add helper to convert HMM pfn to migrate pfn mpenttil
2026-05-05  5:16 ` [PATCH v9 3/5] mm/hmm: do the plumbing for HMM to participate in migration mpenttil
2026-05-05  5:16 ` [PATCH v9 4/5] mm: setup device page migration in HMM pagewalk mpenttil
2026-05-05  5:16 ` [PATCH v9 5/5] lib/test_hmm:: add a new testcase for the migrate on fault mpenttil
2026-05-05  7:09 ` [PATCH v9 0/5] Migrate on fault for device pages Alistair Popple
2026-05-05  7:18   ` Mika Penttilä
2026-05-05 18:01     ` Matthew Brost
2026-05-05 18:47       ` Mika Penttilä [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=433d0729-b141-4f19-a0c3-656f033e8ea1@redhat.com \
    --to=mpenttil@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox