From: "Timur Kristóf" <timur.kristof@gmail.com>
To: Amir Shetaia <amir.shetaia@amd.com>,
Alex Deucher <alexdeucher@gmail.com>
Cc: amd-gfx@lists.freedesktop.org,
"Alex Deucher" <alexander.deucher@amd.com>,
christian.koenig@amd.com, "Marek Olšák" <maraeo@gmail.com>,
"Natalie Vock" <natalie.vock@gmx.de>,
"Melissa Wen" <mwen@igalia.com>
Subject: Re: [PATCH 0/6] drm/amdgpu: Improve retry fault handling
Date: Wed, 13 May 2026 18:43:18 +0200 [thread overview]
Message-ID: <2795714.vuYhMxLoTh@timur-hyperion> (raw)
In-Reply-To: <CADnq5_Ot+iPKNtxTvA7rWvdsDie3vdrHN8ftwM84FY-+p3A_0g@mail.gmail.com>
On Wednesday, May 13, 2026 6:36:02 PM Central European Summer Time Alex
Deucher wrote:
> + Amir
>
> Amir may have some insights on navi4x as he was looking at this recently.
>
> Alex
Hi Alex, Amir,
I think we are very close to enabling retry faults by default on Navi 3.
I'd be happy to receive feedback on the above series.
With regards to Navi 4:
I also attempted to get it working on Navi 48, and I managed to get retry
faults enabled, but it seems that amdgpu_vm_handle_fault() can't actually
resolve the page fault on Navi 48. It just keeps retrying until it times out.
Christian suggested this may be due to an invalid page being stuck in the
cache. I tried adding a TLB flush but unfortunately that just made it worse (it
hangs irrecoverably).
Any insight is appreciated!
Thanks & best regards,
Timur
>
> On Wed, May 13, 2026 at 12:30 PM Timur Kristóf <timur.kristof@gmail.com>
wrote:
> > Fix some issues regarding retry fault handling,
> > such as enabling the retry fault interrupt (necessary
> > for retry faults to work) and such.
> >
> > Improve retry faults on Navi 3 dGPUs by enabling
> > the filter CAM, which can filter the repeated page
> > fault interrupts that happen when retry faults are
> > enabled, making the handling more efficient.
> >
> > With this series, the kernel is able to mitigate
> > most page faults on Navi 3 without causing a hang
> > and without a need to reset the GPU, when the
> > amdgpu.noretry=0 module parameter is set.
> >
> > Timur Kristóf (6):
> > drm/amdgpu: Use gmc->noretry instead of amdgpu_noretry directly
> > drm/amdgpu/gfxhub: Enable retry fault interrupts when needed
> > drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as needed
> > drm/amdgpu/gmc: Don't compare page fault timestamps with other
> >
> > interrupts
> >
> > drm/amdgpu/ih: Add retry_cam_ack IH function pointer
> > drm/amdgpu: Enable retry CAM on Navi 3 dGPUs
> >
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 +++++--
> > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 1 +
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c | 17 ++++++++++-------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c | 17 ++++++++++-------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v12_1.c | 19 +++++++++++--------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 15 +++++++++------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15 +++++++++------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 15 +++++++++------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 15 +++++++++------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c | 17 ++++++++++-------
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c | 17 ++++++++++-------
> > drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 ++++-
> > drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 18 +++++++++++++++++-
> > drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 6 ++++++
> > drivers/gpu/drm/amd/amdgpu/mmhub_v3_0.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/mmhub_v3_0_1.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/mmhub_v3_0_2.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/mmhub_v3_3.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/mmhub_v4_1_0.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/mmhub_v4_2_0.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 8 +++++++-
> > 22 files changed, 134 insertions(+), 71 deletions(-)
> >
> > --
> > 2.54.0
next prev parent reply other threads:[~2026-05-13 16:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 16:30 [PATCH 0/6] drm/amdgpu: Improve retry fault handling Timur Kristóf
2026-05-13 16:30 ` [PATCH 1/6] drm/amdgpu: Use gmc->noretry instead of amdgpu_noretry directly Timur Kristóf
2026-05-13 16:30 ` [PATCH 2/6] drm/amdgpu/gfxhub: Enable retry fault interrupts when needed Timur Kristóf
2026-05-13 16:30 ` [PATCH 3/6] drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as needed Timur Kristóf
2026-05-13 16:30 ` [PATCH 4/6] drm/amdgpu/gmc: Don't compare page fault timestamps with other interrupts Timur Kristóf
2026-05-13 16:30 ` [PATCH 5/6] drm/amdgpu/ih: Add retry_cam_ack IH function pointer Timur Kristóf
2026-05-13 16:30 ` [PATCH 6/6] drm/amdgpu: Enable retry CAM on Navi 3 dGPUs Timur Kristóf
2026-05-13 16:36 ` [PATCH 0/6] drm/amdgpu: Improve retry fault handling Alex Deucher
2026-05-13 16:43 ` Timur Kristóf [this message]
2026-05-13 17:28 ` Shetaia, Amir
2026-05-13 17:32 ` Deucher, Alexander
2026-05-13 17:51 ` Timur Kristóf
2026-05-13 20:32 ` Shetaia, Amir
2026-05-13 22:12 ` Timur Kristóf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2795714.vuYhMxLoTh@timur-hyperion \
--to=timur.kristof@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=alexdeucher@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=amir.shetaia@amd.com \
--cc=christian.koenig@amd.com \
--cc=maraeo@gmail.com \
--cc=mwen@igalia.com \
--cc=natalie.vock@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox