* [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
@ 2026-03-04 13:54 Alex Deucher
2026-03-04 13:56 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2026-03-04 13:54 UTC (permalink / raw)
To: amd-gfx; +Cc: Alex Deucher, Christian König, Prike Liang
This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
This causes framebuffer corruption after suspend.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
Cc: Christian König <christian.koenig@amd.com>
Cc: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 01fef0e4f4085..25b1d679ba262 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params *params,
}
/* Prepare a TLB flush fence to be attached to PTs */
- if (!params->unlocked) {
+ if (!params->unlocked && vm->is_compute_context) {
amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
/* Makes sure no PD/PT is freed before the flush */
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-04 13:54 [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update" Alex Deucher
@ 2026-03-04 13:56 ` Christian König
2026-03-05 6:48 ` Liang, Prike
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2026-03-04 13:56 UTC (permalink / raw)
To: Alex Deucher, amd-gfx; +Cc: Prike Liang
On 3/4/26 14:54, Alex Deucher wrote:
> This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
>
> This causes framebuffer corruption after suspend.
But prevents massive memory corruption with userqueues.
I have strong doubts that this is related to the FB corruption in any way, it will just change the timing.
Regards,
Christian.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Prike Liang <Prike.Liang@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 01fef0e4f4085..25b1d679ba262 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params *params,
> }
>
> /* Prepare a TLB flush fence to be attached to PTs */
> - if (!params->unlocked) {
> + if (!params->unlocked && vm->is_compute_context) {
> amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>
> /* Makes sure no PD/PT is freed before the flush */
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-04 13:56 ` Christian König
@ 2026-03-05 6:48 ` Liang, Prike
2026-03-05 9:43 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Liang, Prike @ 2026-03-05 6:48 UTC (permalink / raw)
To: Koenig, Christian, Deucher, Alexander,
amd-gfx@lists.freedesktop.org
[Public]
It’s possible that we failed to save and invalidate some active pages during suspend, which then prevents those pages from being restored correctly on resume.
For now, we still rely on this patch to keep the userq page tables updated and synchronized. Until the full solution is ready, how about we fall back to the initial approach and restrict this TLB flush to only the userq path?
Regards,
Prike
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Wednesday, March 4, 2026 9:57 PM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>; amd-
> gfx@lists.freedesktop.org
> Cc: Liang, Prike <Prike.Liang@amd.com>
> Subject: Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
>
> On 3/4/26 14:54, Alex Deucher wrote:
> > This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
> >
> > This causes framebuffer corruption after suspend.
>
> But prevents massive memory corruption with userqueues.
>
> I have strong doubts that this is related to the FB corruption in any way, it will just
> change the timing.
>
> Regards,
> Christian.
>
> >
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Prike Liang <Prike.Liang@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index 01fef0e4f4085..25b1d679ba262 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct
> amdgpu_vm_update_params *params,
> > }
> >
> > /* Prepare a TLB flush fence to be attached to PTs */
> > - if (!params->unlocked) {
> > + if (!params->unlocked && vm->is_compute_context) {
> > amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
> >
> > /* Makes sure no PD/PT is freed before the flush */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-05 6:48 ` Liang, Prike
@ 2026-03-05 9:43 ` Christian König
2026-03-12 21:08 ` Mario Limonciello
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2026-03-05 9:43 UTC (permalink / raw)
To: Liang, Prike, Deucher, Alexander, amd-gfx@lists.freedesktop.org
The original reporter already mentioned on the ticket that this patch is not the actual cause of the issues.
It basically just changes timing to create and eventually wait for the TLB fence to signal.
Let's see what the reporter finds with his extended bisect.
Regards,
Christian.
On 3/5/26 07:48, Liang, Prike wrote:
> [Public]
>
> It’s possible that we failed to save and invalidate some active pages during suspend, which then prevents those pages from being restored correctly on resume.
>
> For now, we still rely on this patch to keep the userq page tables updated and synchronized. Until the full solution is ready, how about we fall back to the initial approach and restrict this TLB flush to only the userq path?
>
> Regards,
> Prike
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Wednesday, March 4, 2026 9:57 PM
>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; amd-
>> gfx@lists.freedesktop.org
>> Cc: Liang, Prike <Prike.Liang@amd.com>
>> Subject: Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
>>
>> On 3/4/26 14:54, Alex Deucher wrote:
>>> This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
>>>
>>> This causes framebuffer corruption after suspend.
>>
>> But prevents massive memory corruption with userqueues.
>>
>> I have strong doubts that this is related to the FB corruption in any way, it will just
>> change the timing.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Prike Liang <Prike.Liang@amd.com>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 01fef0e4f4085..25b1d679ba262 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct
>> amdgpu_vm_update_params *params,
>>> }
>>>
>>> /* Prepare a TLB flush fence to be attached to PTs */
>>> - if (!params->unlocked) {
>>> + if (!params->unlocked && vm->is_compute_context) {
>>> amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>>>
>>> /* Makes sure no PD/PT is freed before the flush */
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-05 9:43 ` Christian König
@ 2026-03-12 21:08 ` Mario Limonciello
2026-03-13 12:03 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Mario Limonciello @ 2026-03-12 21:08 UTC (permalink / raw)
To: Christian König, Liang, Prike, Deucher, Alexander,
amd-gfx@lists.freedesktop.org
There is actually a contingent of two people who claim that this patch
is the cause for MES resets here:
https://gitlab.freedesktop.org/drm/amd/-/issues/4749
On 3/5/2026 3:43 AM, Christian König wrote:
> The original reporter already mentioned on the ticket that this patch is not the actual cause of the issues.
>
> It basically just changes timing to create and eventually wait for the TLB fence to signal.
>
> Let's see what the reporter finds with his extended bisect.
>
> Regards,
> Christian.
>
> On 3/5/26 07:48, Liang, Prike wrote:
>> [Public]
>>
>> It’s possible that we failed to save and invalidate some active pages during suspend, which then prevents those pages from being restored correctly on resume.
>>
>> For now, we still rely on this patch to keep the userq page tables updated and synchronized. Until the full solution is ready, how about we fall back to the initial approach and restrict this TLB flush to only the userq path?
>>
>> Regards,
>> Prike
>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Wednesday, March 4, 2026 9:57 PM
>>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; amd-
>>> gfx@lists.freedesktop.org
>>> Cc: Liang, Prike <Prike.Liang@amd.com>
>>> Subject: Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
>>>
>>> On 3/4/26 14:54, Alex Deucher wrote:
>>>> This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
>>>>
>>>> This causes framebuffer corruption after suspend.
>>>
>>> But prevents massive memory corruption with userqueues.
>>>
>>> I have strong doubts that this is related to the FB corruption in any way, it will just
>>> change the timing.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Prike Liang <Prike.Liang@amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 01fef0e4f4085..25b1d679ba262 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct
>>> amdgpu_vm_update_params *params,
>>>> }
>>>>
>>>> /* Prepare a TLB flush fence to be attached to PTs */
>>>> - if (!params->unlocked) {
>>>> + if (!params->unlocked && vm->is_compute_context) {
>>>> amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>>>>
>>>> /* Makes sure no PD/PT is freed before the flush */
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-12 21:08 ` Mario Limonciello
@ 2026-03-13 12:03 ` Christian König
2026-03-13 23:58 ` Mario Limonciello
0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2026-03-13 12:03 UTC (permalink / raw)
To: Mario Limonciello, Liang, Prike, Deucher, Alexander,
amd-gfx@lists.freedesktop.org
Yeah, but that is still not the root cause.
Attaching the TLB fence all the time just makes more use of the MES, it doesn't cause any additional problems which wouldn't have been there before.
Regards,
Christian.
On 3/12/26 22:08, Mario Limonciello wrote:
> There is actually a contingent of two people who claim that this patch is the cause for MES resets here:
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/4749
>
>
> On 3/5/2026 3:43 AM, Christian König wrote:
>> The original reporter already mentioned on the ticket that this patch is not the actual cause of the issues.
>>
>> It basically just changes timing to create and eventually wait for the TLB fence to signal.
>>
>> Let's see what the reporter finds with his extended bisect.
>>
>> Regards,
>> Christian.
>>
>> On 3/5/26 07:48, Liang, Prike wrote:
>>> [Public]
>>>
>>> It’s possible that we failed to save and invalidate some active pages during suspend, which then prevents those pages from being restored correctly on resume.
>>>
>>> For now, we still rely on this patch to keep the userq page tables updated and synchronized. Until the full solution is ready, how about we fall back to the initial approach and restrict this TLB flush to only the userq path?
>>>
>>> Regards,
>>> Prike
>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>> Sent: Wednesday, March 4, 2026 9:57 PM
>>>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; amd-
>>>> gfx@lists.freedesktop.org
>>>> Cc: Liang, Prike <Prike.Liang@amd.com>
>>>> Subject: Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
>>>>
>>>> On 3/4/26 14:54, Alex Deucher wrote:
>>>>> This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
>>>>>
>>>>> This causes framebuffer corruption after suspend.
>>>>
>>>> But prevents massive memory corruption with userqueues.
>>>>
>>>> I have strong doubts that this is related to the FB corruption in any way, it will just
>>>> change the timing.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Prike Liang <Prike.Liang@amd.com>
>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> index 01fef0e4f4085..25b1d679ba262 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct
>>>> amdgpu_vm_update_params *params,
>>>>> }
>>>>>
>>>>> /* Prepare a TLB flush fence to be attached to PTs */
>>>>> - if (!params->unlocked) {
>>>>> + if (!params->unlocked && vm->is_compute_context) {
>>>>> amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>>>>>
>>>>> /* Makes sure no PD/PT is freed before the flush */
>>>
>>
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
2026-03-13 12:03 ` Christian König
@ 2026-03-13 23:58 ` Mario Limonciello
0 siblings, 0 replies; 7+ messages in thread
From: Mario Limonciello @ 2026-03-13 23:58 UTC (permalink / raw)
To: Christian König, Mario Limonciello, Liang, Prike,
Deucher, Alexander, amd-gfx@lists.freedesktop.org
Well the original patch was intended for userq but is causing issues on
systems without userq. How about just narrowing it down to only userq
for now until we have a root cause?
On 3/13/26 7:03 AM, Christian König wrote:
> Yeah, but that is still not the root cause.
>
> Attaching the TLB fence all the time just makes more use of the MES, it doesn't cause any additional problems which wouldn't have been there before.
>
> Regards,
> Christian.
>
> On 3/12/26 22:08, Mario Limonciello wrote:
>> There is actually a contingent of two people who claim that this patch is the cause for MES resets here:
>>
>> https://gitlab.freedesktop.org/drm/amd/-/issues/4749
>>
>>
>> On 3/5/2026 3:43 AM, Christian König wrote:
>>> The original reporter already mentioned on the ticket that this patch is not the actual cause of the issues.
>>>
>>> It basically just changes timing to create and eventually wait for the TLB fence to signal.
>>>
>>> Let's see what the reporter finds with his extended bisect.
>>>
>>> Regards,
>>> Christian.
>>>
>>> On 3/5/26 07:48, Liang, Prike wrote:
>>>> [Public]
>>>>
>>>> It’s possible that we failed to save and invalidate some active pages during suspend, which then prevents those pages from being restored correctly on resume.
>>>>
>>>> For now, we still rely on this patch to keep the userq page tables updated and synchronized. Until the full solution is ready, how about we fall back to the initial approach and restrict this TLB flush to only the userq path?
>>>>
>>>> Regards,
>>>> Prike
>>>>
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Sent: Wednesday, March 4, 2026 9:57 PM
>>>>> To: Deucher, Alexander <Alexander.Deucher@amd.com>; amd-
>>>>> gfx@lists.freedesktop.org
>>>>> Cc: Liang, Prike <Prike.Liang@amd.com>
>>>>> Subject: Re: [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update"
>>>>>
>>>>> On 3/4/26 14:54, Alex Deucher wrote:
>>>>>> This reverts commit f3854e04b708d73276c4488231a8bd66d30b4671.
>>>>>>
>>>>>> This causes framebuffer corruption after suspend.
>>>>>
>>>>> But prevents massive memory corruption with userqueues.
>>>>>
>>>>> I have strong doubts that this is related to the FB corruption in any way, it will just
>>>>> change the timing.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> Cc: Prike Liang <Prike.Liang@amd.com>
>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> index 01fef0e4f4085..25b1d679ba262 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> @@ -1073,7 +1073,7 @@ amdgpu_vm_tlb_flush(struct
>>>>> amdgpu_vm_update_params *params,
>>>>>> }
>>>>>>
>>>>>> /* Prepare a TLB flush fence to be attached to PTs */
>>>>>> - if (!params->unlocked) {
>>>>>> + if (!params->unlocked && vm->is_compute_context) {
>>>>>> amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>>>>>>
>>>>>> /* Makes sure no PD/PT is freed before the flush */
>>>>
>>>
>>>
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-13 23:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 13:54 [PATCH] Revert "drm/amdgpu: attach tlb fence to the PTs update" Alex Deucher
2026-03-04 13:56 ` Christian König
2026-03-05 6:48 ` Liang, Prike
2026-03-05 9:43 ` Christian König
2026-03-12 21:08 ` Mario Limonciello
2026-03-13 12:03 ` Christian König
2026-03-13 23:58 ` Mario Limonciello
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox