From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE051346AC0 for ; Tue, 5 May 2026 18:48:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778006888; cv=none; b=r7lPEo///3u7vRRc5pHNqv4gN8lKaf+mP/L2DUXWcQ5my6PT5Z5PW67X55B2tNhDBZUgNjEhj80+SbFNY0kJCc+brJ9L8TbuAHoZ7hXOyMh8/FC7EvbeqPdt6d8SoCm02mJ8ZfoX+ps8O433DTdJY7Lm3Lk8+giz+8YWtHzJ7zM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778006888; c=relaxed/simple; bh=Gh1vGJx0AOfzazvcKodOug1yRJrkqCDFlnpgILRKErs=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=kcV2BWgDXSUMhp7WBKm3dmrSblg8eEwPsui3df16mI7e8nwhtC0u7VIBYSsgBquPSDorZtOU0+aeXerwQvke0SsNoqi7lpLlmBkZ5LLUXL0QFNymZHkjy2ZWDW1bCKLLreHjELxU92W5E3Trl804NyO+193OFniEfPcpcfuQkjw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cbA0Q1EZ; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=WFM1N3vM; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cbA0Q1EZ"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="WFM1N3vM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778006885; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C1bqbiEVoTmEPJHjXzHnukQrIZ297oQVbCi2us/0k/k=; b=cbA0Q1EZTJlysihQVgDD3oov1T+wMvYDw6Oanp4ghSz8JSyfnWULR2VkX3P39dgwyLiHW3 og5cM2VvGBLATRgEM/P43XqmMdhfPsGiu0CXSL1OYkMBPgfaqU4Hxu59IFDSF8ayxUPe5M 003F3asYepJ4SfbqYccJjF2V+84Abak= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-856DueefOpiECBbmdEf_OA-1; Tue, 05 May 2026 14:48:01 -0400 X-MC-Unique: 856DueefOpiECBbmdEf_OA-1 X-Mimecast-MFC-AGG-ID: 856DueefOpiECBbmdEf_OA_1778006880 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-39391b06ef8so14091561fa.2 for ; Tue, 05 May 2026 11:48:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1778006880; x=1778611680; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=C1bqbiEVoTmEPJHjXzHnukQrIZ297oQVbCi2us/0k/k=; b=WFM1N3vMPCngO7YP4EK7/+3/RDWNT2JcqFUpAGFAG8zLKGP7Md7BOmDJktQAatvVU9 pLv4dIi3C4X6hu91Pmno9ZdowDQYFJdZyvMKxjmxPdln8TEysFdkTh/rYKhLYfz2WeLi I3qYqBWJo3cb/HMvpbwjTfOBkEYdgmvjKsz7dpbdHLMYICU+P/eaJq/PJT09nWDhfelZ PnlmKiyiq2DMkXF5S95eXIQ9ErifxGRmOowfG4fEOyN4E7X0GZ5WPOmrxTr13ZU3nrsS tQmulv7x5oHhgXAKQyu5J7NnLPJUBDLIhBHJdAVMmzjd9337AYr2we90zz/p39uDoKXt 48Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778006880; x=1778611680; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=C1bqbiEVoTmEPJHjXzHnukQrIZ297oQVbCi2us/0k/k=; b=KNPvyGohRoh62ODcJ16JvabFRc4V2bQ5rIfCnwoTzQl1EOwCvTx6QcjV8KoTKZIOB7 MBlvFp4GmMGBkaZc0Up8cNy1+pv++MC4pMmJwI10ndNf3Gg8RVgsTyycJfAQkhGXJjbQ xthVD4ACnb/iAgb5mPWK/2NGKGWK22tjN2FI7j+tlpJwqQLxyokHrAQoCsKobYljxhdX qKwgnrjC2qgr7DkrRsAhR32VeSyHwFP3EiX73la80sj+0ElAhPVvph8tfbjAxq6YI+d2 S1QkC53+r5HxZ65OrSi/n9hT5PT5/+aBeqlFZ+1+046EuWLXOWizMNTzT7Kv2a0qiPQd CpJQ== X-Forwarded-Encrypted: i=1; AFNElJ+LCOPb4JrwL38Rxfx+V/GEdyL+6xFI9TI3pmbR66RcZElABXp9DVtiAH/k8Tshlo+PWJctb0UrXIwwGrU=@vger.kernel.org X-Gm-Message-State: AOJu0Yz9Tj8cvhaHGyi1msTZGpFPrzREU3jTUVdL7BOm2IGvouIv2/Wz JyDoQWsCCNeGJ7Jw4DD6zELWfHEtZ4hrFIkjsLLcgDbKW3hLQsMyGq75l/vYkg7zpUSy8pCjfVC KjwqxG4IWd7zX4YOWt2u4KMj2x1rhNNCaZwup22aO2h2AieetJpFQ166s8MNUhAKD X-Gm-Gg: AeBDiesPLphsDCzLgzedIRErqoSjd4u8HH09RVfNmmfciyVo3RGioUr8ThMPgYG9i+w gU7UPcevoZZuxzJ+CX3wTl2tlXj949ukZkzWGLI2EhdX0HEb96uiUAuwTOo98GBG8yWGoiqbj89 CWCzIK/m6mXE9q9FVobersmC/XrU3eGyRgsKhBduumSWBC1V8Qd30mcaBRhqhIrnghxMEXNiXU+ JLHHdpLFFq1Q/nT7Q8TIG1u2g2qUfGGR/1ImZLY0/7jJUIAm04lxF78y7ztw9gSMUsvJL1U44Pj wcsRgoefuKed+oPuCmtpPDnJ8PppGR/bGdW5h5U2hBiPgw+C6suHzZJzHYnU1ssCkSg/o4M0hRh HS+n2tQENaZaW/eEmSyj0Ge+9rPufgMxDS3MvkYwcOVRmYgqNE2ErhjhB3Q== X-Received: by 2002:a2e:864e:0:b0:38e:2fcc:78fe with SMTP id 38308e7fff4ca-393c40bcb5cmr1627331fa.3.1778006880089; Tue, 05 May 2026 11:48:00 -0700 (PDT) X-Received: by 2002:a2e:864e:0:b0:38e:2fcc:78fe with SMTP id 38308e7fff4ca-393c40bcb5cmr1627151fa.3.1778006879554; Tue, 05 May 2026 11:47:59 -0700 (PDT) Received: from [192.168.1.86] (85-23-51-1.bb.dnainternet.fi. [85.23.51.1]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-393610ba631sm44569421fa.12.2026.05.05.11.47.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 May 2026 11:47:59 -0700 (PDT) Message-ID: <433d0729-b141-4f19-a0c3-656f033e8ea1@redhat.com> Date: Tue, 5 May 2026 21:47:58 +0300 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v9 0/5] Migrate on fault for device pages To: Matthew Brost Cc: Alistair Popple , linux-mm@kvack.org, dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org, David Hildenbrand , Jason Gunthorpe , Leon Romanovsky , Balbir Singh , Zi Yan , Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko References: <20260505051658.2219537-1-mpenttil@redhat.com> Content-Language: en-US From: =?UTF-8?Q?Mika_Penttil=C3=A4?= In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 5/5/26 21:01, Matthew Brost wrote: > On Tue, May 05, 2026 at 10:18:14AM +0300, Mika Penttil=C3=A4 wrote: >> On 5/5/26 10:09, Alistair Popple wrote: >> >>> Thanks for doing this work Mika. I've been meaning to take a look at = this series >>> for a while. I'm currently at LSFMM but will try and take a look this= week or >>> next as it sounds quite useful. >>> >>> - Alistair >> Thanks Alistair and no problem, appreciate your insights whenever you = have time. >> > It looks like this series is breaking Intel's CI [1]. Looks like > something in RCU is blowing up: > > <4> [212.361418] ------------[ cut here ]------------ > <4> [212.361431] Voluntary context switch within RCU read-side critical= section! > <4> [212.361432] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_cont= ext_switch+0x82/0x780, CPU#11: kworker/u65:5/2352 > <4> [212.361440] Modules linked in: snd_hda_codec_intelhdmi snd_hda_cod= ec_hdmi mei_lb mei_gsc_proxy mtd_intel_dg mei_gsc xe drm_gpuvm drm_gpusvm= _helper drm_buddy gpu_sched drm_ttm_helper ttm drm_suballoc_helper drm_ex= ec drm_display_helper cec rc_core drm_kunit_helpers i2c_algo_bit kunit ov= erlay intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncor= e_frequency_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclam= p hid_generic coretemp eeepc_wmi cmdlinepart asus_wmi binfmt_misc sparse_= keymap spi_nor mei_hdcp mei_pxp mtd wmi_bmof kvm_intel kvm irqbypass aesn= i_intel gf128mul r8169 usbhid rapl hid intel_cstate realtek snd_hda_intel= phy_package snd_intel_dspcfg intel_pmc_core snd_hda_codec idma64 nls_iso= 8859_1 pmt_telemetry snd_hda_core video snd_hwdep pmt_discovery snd_pcm i= 2c_i801 pinctrl_alderlake pmt_class snd_timer i2c_mux intel_pmc_ssram_tel= emetry acpi_tad acpi_pad mei_me snd i2c_smbus spi_intel_pci soundcore mei= spi_intel wmi intel_vsec dm_multipath msr nvme_fabrics fuse efi_pstore n= fnetlink autofs4 > <4> [212.361711] CPU: 11 UID: 0 PID: 2352 Comm: kworker/u65:5 Tainted: = G S U 7.1.0-rc2-lgci-xe-xe-pw-165953v1-debug+ #1 PREEMPT(l= azy)=20 > <4> [212.361715] Tainted: [S]=3DCPU_OUT_OF_SPEC, [U]=3DUSER > <4> [212.361716] Hardware name: ASUS System Product Name/PRIME Z790-P W= IFI, BIOS 0812 02/24/2023 > <4> [212.361718] Workqueue: xe_page_fault_work_queue xe_pagefault_queue= _work [xe] > <4> [212.361833] RIP: 0010:rcu_note_context_switch+0x82/0x780 > <4> [212.361838] Code: 45 85 c0 74 0f 65 8b 05 24 84 ab 02 85 c0 0f 84 = 8d 01 00 00 45 84 ed 75 16 8b 83 bc 08 00 00 85 c0 7e 0c 48 8d 3d de ad 4= d 02 <67> 48 0f b9 3a 8b 83 bc 08 00 00 85 c0 7e 0d 80 bb c0 08 00 00 00 > <4> [212.361840] RSP: 0018:ffffc9000186f4a0 EFLAGS: 00010002 > <4> [212.361843] RAX: 0000000000000001 RBX: ffff88810a3a8040 RCX: 00000= 00000000000 > <4> [212.361845] RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffff= fff839bcea0 > <4> [212.361846] RBP: ffffc9000186f4e8 R08: 0000000000000001 R09: 00000= 00000000000 > <4> [212.361848] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8= 8885f1b6a00 > <4> [212.361849] R13: 0000000000000000 R14: ffffffff83248312 R15: ffffc= 9000186f630 > <4> [212.361851] FS: 0000000000000000(0000) GS:ffff8888db203000(0000) = knlGS:0000000000000000 > <4> [212.361853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> [212.361854] CR2: 00007fe433b2f088 CR3: 000000000344a000 CR4: 00000= 00000f52ef0 > <4> [212.361856] PKRU: 55555554 > <4> [212.361858] Call Trace: > <4> [212.361859] > <4> [212.361862] ? lock_is_held_type+0xa3/0x130 > <4> [212.361868] __schedule+0x103/0x1f70 > <4> [212.361870] ? lock_acquire+0xc4/0x300 > <4> [212.361874] ? find_held_lock+0x31/0x90 > <4> [212.361877] ? schedule+0x10e/0x180 > <4> [212.361880] ? lock_release+0xd0/0x2b0 > <4> [212.361885] schedule+0x3a/0x180 > <4> [212.361888] io_schedule+0x4c/0x80 > <4> [212.361890] ? softleaf_entry_wait_on_locked+0x147/0x2b0 > <4> [212.361894] softleaf_entry_wait_on_locked+0x24f/0x2b0 > <4> [212.361899] ? __pfx_wake_page_function+0x10/0x10 > <4> [212.361904] migration_entry_wait+0xff/0x190 > <4> [212.361909] hmm_vma_handle_pte+0x440/0x790 > <4> [212.361914] hmm_vma_walk_pmd+0x5c8/0x1360 > <4> [212.361918] ? xe_pagefault_queue_work+0x1a9/0x520 [xe] > <4> [212.362015] walk_pgd_range+0x57f/0xd70 > <4> [212.362017] ? lock_is_held_type+0xa3/0x130 > <4> [212.362028] __walk_page_range+0x8e/0x290 > <4> [212.362034] walk_page_range_mm_unsafe+0x19e/0x270 > <4> [212.362036] ? trace_hardirqs_on+0x22/0xf0 > <4> [212.362043] walk_page_range+0x2a/0x40 > <4> [212.362045] hmm_range_fault+0x94/0x190 > <4> [212.362053] drm_gpusvm_get_pages+0x269/0xa30 [drm_gpusvm_helper] > <4> [212.362067] drm_gpusvm_range_get_pages+0x2e/0x50 [drm_gpusvm_help= er] > <4> [212.362071] __xe_svm_handle_pagefault+0x3e0/0xef0 [xe] > <4> [212.362181] ? __lock_acquire+0x43e/0x2790 > <4> [212.362188] ? lock_is_held_type+0xa3/0x130 > <4> [212.362193] ? lock_is_held_type+0xa3/0x130 > <4> [212.362197] ? xe_vm_find_overlapping_vma+0x57/0x1e0 [xe] > <4> [212.362304] xe_svm_handle_pagefault+0x3d/0xb0 [xe] > <4> [212.362412] xe_pagefault_queue_work+0x1a9/0x520 [xe] > <4> [212.362509] process_one_work+0x239/0x740 > <4> [212.362518] worker_thread+0x200/0x3f0 > <4> [212.362521] ? __pfx_worker_thread+0x10/0x10 > <4> [212.362524] kthread+0x10d/0x150 > <4> [212.362527] ? __pfx_kthread+0x10/0x10 > <4> [212.362530] ret_from_fork+0x3bd/0x470 > <4> [212.362533] ? __pfx_kthread+0x10/0x10 > <4> [212.362536] ret_from_fork_asm+0x1a/0x30 > <4> [212.362546] > <4> [212.362547] irq event stamp: 2057044 > > I=E2=80=99ll be out this Thursday for five weeks, but assuming you can = sort this > part out, I=E2=80=99m fine with the series moving forward. I=E2=80=99ve= looked at this > several times, and it seems sane enough to me. > > On our list we also have the Sashiko setup [2], which I=E2=80=99ve foun= d to be > incredibly helpful for series that do deep MM work. I=E2=80=99m not sur= e why > Sashiko is saying this series didn=E2=80=99t apply, since it applied cl= eanly to > our CI branches. If you can get Sashiko to run on it, that might be > helpful as well. > > Matt Yes there seemed to be a missing pte_unmap() before migration_entry_wait(= )... fixed and sent v10. --Mika > > [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-165953v1/shard-bmg-= 4/igt@xe_exec_system_allocator@process-many-stride-mmap-race-nomemset.htm= l > [2] https://sashiko.dev/#/patchset/20260505051658.2219537-1-mpenttil%40= redhat.com > >