From: "Christian König" <christian.koenig@amd.com>
To: "Timur Kristóf" <timur.kristof@gmail.com>,
"Tvrtko Ursulin" <tursulin@ursulin.net>,
amd-gfx@lists.freedesktop.org,
"Alex Deucher" <alexander.deucher@amd.com>,
"Natalie Vock" <natalie.vock@gmx.de>,
"Mario Limonciello" <mario.limonciello@amd.com>,
"Amir Shetaia" <Amir.Shetaia@amd.com>,
"Marek Olšák" <maraeo@gmail.com>,
"Yang, Philip" <Philip.Yang@amd.com>
Subject: Re: [PATCH 3/7] drm/amdgpu/gmc: Don't compare page fault timestamps with other interrupts
Date: Tue, 16 Jun 2026 14:48:37 +0200 [thread overview]
Message-ID: <420b30ab-e6c1-411a-be5f-ca9ed17cceb0@amd.com> (raw)
In-Reply-To: <2899310.lGaqSPkdTl@timur-hyperion>
On 6/16/26 13:17, Timur Kristóf wrote:
> On Tuesday, June 16, 2026 12:15:12 PM Central European Summer Time Christian
> König wrote:
...
>> The check was added with patch to make retry page faults more resilent to IH
>> ring buffer overflow: commit 3c2d6ea27955cfac8590884d207353eece8c2cee
>> Author: Philip Yang <Philip.Yang@amd.com>
>> Date: Thu Nov 18 15:24:55 2021 -0500
>>
>> drm/amdgpu: handle IH ring1 overflow
>>
>> IH ring1 is used to process GPU retry fault, overflow is enabled to
>> drain retry fault because we want receive other interrupts while
>> handling retry fault to recover range. There is no overflow flag set
>> when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow
>> and drain retry fault.
>>
>> If fault timestamp goes backward, the fault is filtered and should not
>> be processed. Drain fault is finished if processed_timestamp is equal to
>> or larger than checkpoint timestamp.
>>
>> Add amdgpu_ih_functions interface decode_iv_ts for different chips to
>> get timestamp from IV entry with different iv size and timestamp offset.
>> amdgpu_ih_decode_iv_ts_helper is used for vega10, vega20, navi10.
>>
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> Acked-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> But as far as I can see the whole idea is completely broken. The timestamp
>> can also go backward in case of a reset for example.
>>
>> So having this check like this is clearly a bad idea.
>>
>> What we could do in amdgpu_gmc_filter_faults() is to check some range for
>> the timestamp, e.g. last seen timestamp (in amdgpu_gmc_filter_faults(),
>> e.g. only faults) - value X is considered a duplicate caused by ring buffer
>> wrap around.
>>
>
> Hi,
>
> I agree that filtering based on timestamps is not a good idea in general, and
> we should probably re-think it. I wish someone had thought of that in 2021
> when the above patch was accepted.
I remember that I brought up some concerns about that when we originally reviewed the patch, but it was quite an urgent fix (as always) so we accepted it in the end.
> Maybe this patch should be dropped and the timestamp check should be removed
> instead. What do you guys think about that?
That's fine with me but adding Philip just in case we could run into regressions with that.
Regards,
Christian.
>
> Best regards,
> Timur
next prev parent reply other threads:[~2026-06-16 12:48 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 11:45 [PATCH 0/7] drm/amdgpu: Improve retry fault handling (v2) Timur Kristóf
2026-05-25 11:45 ` [PATCH 1/7] drm/amdgpu: Use gmc->noretry instead of amdgpu_noretry directly Timur Kristóf
2026-05-25 11:45 ` [PATCH 2/7] drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as needed Timur Kristóf
2026-05-26 15:00 ` Alex Deucher
2026-05-25 11:45 ` [PATCH 3/7] drm/amdgpu/gmc: Don't compare page fault timestamps with other interrupts Timur Kristóf
2026-06-15 14:32 ` Tvrtko Ursulin
2026-06-15 14:52 ` Timur Kristóf
2026-06-15 15:23 ` Tvrtko Ursulin
2026-06-15 15:32 ` Timur Kristóf
2026-06-15 15:48 ` Tvrtko Ursulin
2026-06-16 10:15 ` Christian König
2026-06-16 11:17 ` Timur Kristóf
2026-06-16 12:48 ` Christian König [this message]
2026-05-25 11:45 ` [PATCH 4/7] drm/amdgpu/ih: Add retry_cam_ack IH function pointer Timur Kristóf
2026-06-15 14:44 ` Tvrtko Ursulin
2026-06-15 15:02 ` Timur Kristóf
2026-06-16 10:34 ` Christian König
2026-05-25 11:45 ` [PATCH 5/7] drm/amdgpu/gfxhub: Enable retry fault interrupts when needed Timur Kristóf
2026-06-16 8:02 ` Tvrtko Ursulin
2026-06-16 11:54 ` Timur Kristóf
2026-05-25 11:45 ` [PATCH 6/7] drm/amdgpu/gfxhub: Respect noretry flag for retry faults on GFX12.1 Timur Kristóf
2026-06-16 8:09 ` Tvrtko Ursulin
2026-06-16 11:57 ` Timur Kristóf
2026-06-16 12:16 ` Tvrtko Ursulin
2026-06-16 12:36 ` Timur Kristóf
2026-05-25 11:45 ` [PATCH 7/7] drm/amdgpu: Enable retry CAM on Navi 3 dGPUs Timur Kristóf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=420b30ab-e6c1-411a-be5f-ca9ed17cceb0@amd.com \
--to=christian.koenig@amd.com \
--cc=Amir.Shetaia@amd.com \
--cc=Philip.Yang@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=maraeo@gmail.com \
--cc=mario.limonciello@amd.com \
--cc=natalie.vock@gmx.de \
--cc=timur.kristof@gmail.com \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.