Re: 回复: [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: "蒋 亦韬" <jytscientist@hotmail.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@kernel.org>,
	"Lorenzo Stoakes" <ljs@kernel.org>,
	"Yang, Philip" <Philip.Yang@amd.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <liam@infradead.org>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: 回复: [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings
Date: Thu, 25 Jun 2026 15:06:15 +0200	[thread overview]
Message-ID: <b3585357-bd02-44ba-8775-62cd9a7aa13a@amd.com> (raw)
In-Reply-To: <SY5PR01MB10599BCF8625DB8EAE9827293C0EC2@SY5PR01MB10599.ausprd01.prod.outlook.com>

Hi Yitao,

adding Philip Yang.

Thanks for the investigation, that sounds like some kind of bug in the KFD SVM handling. The driver should be perfectly capable of handling this.

I strongly suggest to open up a bug report for ROCm and describe how to reproduce this, Philip can probably point you to the right location for that.

Regards,
Christian.

On 6/25/26 15:01, 蒋 亦韬 wrote:
> Hi Christian,
> 
> I agree that my previous approach was wrong. Sorry about that. Please let me clarify the problem I was seeing and how I ended up with that incorrect conclusion.
> 
> The original problem was not a synthetic THP test. I was running ROCm/PyTorch ML training on an AMD Radeon 780M system, and the workload frequently failed with asynchronous HIP kernel launch failures. The userspace error usually surfaced later in PyTorch, for example around a copy/to_device/SetDevice path, but the kernel log showed GPU resets and KFD/MES queue eviction failures.
> 
> The relevant kernel messages I repeatedly saw were along these lines:
> 
>   MES failed to respond to msg=REMOVE_QUEUE
>   MES failed to respond to msg=SUSPEND
>   failed to suspend all gangs
>   failed to remove hardware queue from MES
>   Failed to evict queue
>   Failed to evict process queues
>   GPU reset begin
> 
> While trying to reduce the issue, I saw memory invalidations and THP-related page-table/backing-page activity driving the AMDGPU/KFD path through SVM eviction. On this system, the path I was looking at was roughly:
> 
>   svm_range_cpu_invalidate_pagetables()
>     -> svm_range_evict()
>     -> kgd2kfd_quiesce_mm()
>     -> KFD process queue eviction
>     -> MES REMOVE_QUEUE / SUSPEND
> 
> One thing that misled me was the XNACK-disabled path. Since the issue appeared on an XNACK-disabled APU, and that path requires queue eviction/quiesce when CPU page table invalidations affect GPU mappings, I incorrectly thought the backing-page change itself was something the driver had to prevent.
> 
> Another thing that misled me was that the application was not intentionally asking for THP behavior. From the workload’s point of view, these page transitions looked unrelated to the model computation. I therefore incorrectly assumed that userspace should not be able to change backing-page characteristics in a way that affects a driver mapping already registered with MMU interval notifiers. I now understand from the MM feedback that this is expected behavior, and that the notifier user must handle unmap/remap correctly.
> 
> So the more precise problem is that THP/remap is only one way to trigger the invalidation path. What is failing for my workload is the AMDGPU/KFD/MES queue quiesce/eviction path during those invalidations. When that fails, the GPU resets, and userspace later observes an asynchronous HIP failure.
> 
> Please allow me to continue investigating a more appropriate fix for this problem. I will try to keep the fix boundary within AMDGPU/KFD/MES and avoid changing MM-core or THP policy semantics.
> 
> Regards,
> Yitao
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *发件人:* Christian König <christian.koenig@amd.com>
> *发送时间:* 2026年6月25日 8:35
> *收件人:* Yitao Jiang <jytscientist@hotmail.com>; Alex Deucher <alexander.deucher@amd.com>; David Airlie <airlied@gmail.com>; Simona Vetter <simona@ffwll.ch>; Felix Kuehling <Felix.Kuehling@amd.com>; Andrew Morton <akpm@linux-foundation.org>; David Hildenbrand <david@kernel.org>; Lorenzo Stoakes <ljs@kernel.org>
> *抄送:* Zi Yan <ziy@nvidia.com>; Baolin Wang <baolin.wang@linux.alibaba.com>; Liam R . Howlett <liam@infradead.org>; Nico Pache <npache@redhat.com>; Ryan Roberts <ryan.roberts@arm.com>; Dev Jain <dev.jain@arm.com>; Barry Song <baohua@kernel.org>; Lance Yang <lance.yang@linux.dev>; Vlastimil Babka <vbabka@kernel.org>; Mike Rapoport <rppt@kernel.org>; Suren Baghdasaryan <surenb@google.com>; Michal Hocko <mhocko@suse.com>; Jann Horn <jannh@google.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-mm@kvack.org <linux-mm@kvack.org>
> *主题:* Re: [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings
>  
> On 6/25/26 12:59, Yitao Jiang wrote:
>> Hi,
>> 
>> This series fixes a THP policy problem I found while debugging
>> frequent ROCm GPU failures on an AMD Radeon 780M system during ML
>> training.
>> 
>> Some AMDGPU/KFD user mappings are registered through interval
>> notifiers and cannot safely tolerate the backing VMA changing from base
>> pages to a transparent huge page after registration.
> 
> That's certainly not correct. This is a must have for a whole lot of use cases.
> 
> Why exactly isn't that working for your use case?
> 
> Regards,
> Christian.
> 
>> Userspace can
>> still apply MADV_HUGEPAGE or MADV_COLLAPSE, and khugepaged can also
>> collapse the range, after the GPU mapping has been registered.
>> 
>> On my system this showed up as asynchronous ROCm/HIP kernel launch
>> failures, often reported later at a synchronization or copy point. I
>> expect the issue to be relevant to AMDGPU/KFD mappings on
>> XNACK-disabled GPUs more generally, because those mappings cannot rely
>> on replayable GPU faults after a CPU-side THP remap. I have validated
>> the failure and fix on AMD Radeon 780M / gfx1103.
>> 
>> Patch 1 adds MMU_INTERVAL_NOTIFIER_BLOCK_THP so interval notifier
>> users can ask the MM core to keep the covered VMA range out of THP
>> while the notifier is active. The MM core applies VM_NOHUGEPAGE and
>> clears VM_HUGEPAGE under mmap_lock for write. A later MADV_HUGEPAGE
>> over an active opt-in range is treated as an ignored hint, and
>> MADV_COLLAPSE is rejected by the existing VM_NOHUGEPAGE checks.
>> 
>> Patches 2 and 3 opt in the AMDGPU/KFD paths that need this behavior:
>> HSA userptr BOs, KFD SVM ranges when XNACK is disabled, and
>> GPU_ALWAYS_MAPPED SVM ranges. Other interval notifier users keep their
>> current behavior.
>> 
>> This does not disable THP globally and does not add work to GPU
>> command submission or kernel launch paths. Additional work is limited
>> to opt-in notifier registration, opt-in notifier flag transitions, and
>> MADV_HUGEPAGE attempts that overlap an active opt-in range.
>> 
>> I tested this on top of torvalds/linux commit ab9de95c9cf9 with:
>> 
>>   - scripts/checkpatch.pl --strict --no-tree
>>   - git apply --check
>>   - x86_64 defconfig build with TRANSPARENT_HUGEPAGE=y,
>>     DRM_AMDGPU=m, and HSA_AMD=y for mm/ and AMDGPU/KFD objects
>>   - standalone HSA/HIP reproducers and the ROCm/PyTorch workload that
>>     originally exposed the failure on my Radeon 780M system
>> 
>> The standalone reproducers depend on ROCm userspace libraries, so I
>> have not included them in this series. I can send them separately if
>> useful.
>> 
>> This series was prepared with assistance from OpenAI Codex (GPT-5.5).
>> I reviewed the resulting code and take responsibility for the
>> submission.
>> 
>> Yitao Jiang (3):
>>   mm/mmu_notifier: let interval notifiers block THP
>>   drm/amdgpu: block THP for HSA userptr notifiers
>>   drm/amdkfd: block THP for non-replayable SVM ranges
>> 
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c |  25 ++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c    |  36 ++++-
>>  include/linux/huge_mm.h                 |   5 +-
>>  include/linux/mmu_notifier.h            |  28 ++++
>>  mm/khugepaged.c                         |   9 +-
>>  mm/madvise.c                            |   3 +-
>>  mm/mmu_notifier.c                       | 204 +++++++++++++++++++++++-
>>  7 files changed, 286 insertions(+), 24 deletions(-)
>> 
>> 
>> base-commit: ab9de95c9cf952332ab79453b4b5d1bfca8e514f
>> --
>> 2.53.0
>

next prev parent reply	other threads:[~2026-06-25 13:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 10:59 [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings Yitao Jiang
2026-06-25 10:59 ` [PATCH 1/3] mm/mmu_notifier: let interval notifiers block THP Yitao Jiang
2026-06-25 11:18   ` sashiko-bot
2026-06-25 11:50   ` David Hildenbrand (Arm)
2026-06-25 11:58   ` Lorenzo Stoakes
2026-06-25 10:59 ` [PATCH 2/3] drm/amdgpu: block THP for HSA userptr notifiers Yitao Jiang
2026-06-25 11:26   ` sashiko-bot
2026-06-25 12:36   ` Christian König
2026-06-25 10:59 ` [PATCH 3/3] drm/amdkfd: block THP for non-replayable SVM ranges Yitao Jiang
2026-06-25 11:11   ` sashiko-bot
2026-06-25 11:47 ` [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings David Hildenbrand (Arm)
2026-06-25 11:54   ` Lorenzo Stoakes
2026-06-25 12:14     ` 回复: " 蒋 亦韬
2026-06-25 12:35 ` Christian König
2026-06-25 13:01   ` 回复: " 蒋 亦韬
2026-06-25 13:06     ` Christian König [this message]
2026-06-25 20:51       ` Kuehling, Felix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3585357-bd02-44ba-8775-62cd9a7aa13a@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jannh@google.com \
    --cc=jytscientist@hotmail.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.