From: "Timur Kristóf" <timur.kristof@gmail.com>
To: amd-gfx@lists.freedesktop.org,
"Alex Deucher" <alexander.deucher@amd.com>,
christian.koenig@amd.com, "Natalie Vock" <natalie.vock@gmx.de>,
"Mario Limonciello" <mario.limonciello@amd.com>,
"Amir Shetaia" <Amir.Shetaia@amd.com>,
"Marek Olšák" <maraeo@gmail.com>,
"Tvrtko Ursulin" <tursulin@ursulin.net>
Subject: Re: [PATCH 5/7] drm/amdgpu/gfxhub: Enable retry fault interrupts when needed
Date: Tue, 16 Jun 2026 13:54:20 +0200 [thread overview]
Message-ID: <10181145.eNJFYEL58v@timur-hyperion> (raw)
In-Reply-To: <c08e20bc-ab15-4de8-8eb4-e01c090868d4@ursulin.net>
On Tuesday, June 16, 2026 10:02:44 AM Central European Summer Time Tvrtko
Ursulin wrote:
> On 25/05/2026 12:45, Timur Kristóf wrote:
> > Enable retry fault interrupts when initializing the GFXHUB
> > system aperture registers according to whether retrying
> > page faults is enabled in amdgpu (ie. amdgpu.noretry=0).
> >
> > Needs to be done for each GFXHUB version at once,
> > because none of them actually enabled this interrupt.
> >
> > Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
> > ---
> >
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 2 ++
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c | 9 +++++++--
> > drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c | 9 +++++++--
> > 8 files changed, 51 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c index
> > 652eea6eae4a..ef20eafd59ae 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c
> > @@ -155,6 +155,7 @@ static void
> > gfxhub_v11_5_0_init_gart_aperture_regs(struct amdgpu_device *adev)>
> > static void gfxhub_v11_5_0_init_system_aperture_regs(struct
> > amdgpu_device *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > WREG32_SOC15(GC, 0, regGCMC_VM_AGP_BASE, 0);
> > WREG32_SOC15(GC, 0, regGCMC_VM_AGP_BOT, adev->gmc.agp_start >>
24);
> >
> > @@ -180,8 +181,12 @@ static void
> > gfxhub_v11_5_0_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0,
regGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15_PREREG(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
>
> As a side note, I have two patches which shrink these register access
> macros considerably:
>
> https://patchwork.freedesktop.org/patch/720726/?series=165432&rev=1
>
> Going back to this patch, a question - how do gfxhub ip versions relate
> to the default set from gc ip versions in amdgpu_gmc_noretry_set()? I am
> wondering on which platforms, if any, do at this point in the series,
> retry fault interrupts get enabled where they previously were not.
As far as I know, currently retry faults are only enabled by default on some
datacenter GPUs and not for any consumer GPUs.
This patch just makes sure to actually program the registers to enable retry
faults when they need to be enabled (at the moment, this means, when the user
has amdgpu.noretry=0 on their kernel command line). The series does not change
which generations have it enabled by default.
In order to enable retry faults by default, I would like to make work reliably
first. At the moment that blocked by Christian's recent refactor which is
currently under review. I will have to rebase those two patches once
Christian's work lands. Then we can consider enabling retry faults by default
on Navi 3 and Navi 4 dGPUs.
Note that APUs and Navi 1-2 dGPUs will still need more work because they don't
have the retry CAM so they will need a better way to filter the page fault
interrupts. However I don't want to start working on that until the current
three series is reviewed.
Thanks,
Timur
>
> > }
> >
> > static void gfxhub_v11_5_0_init_tlb_regs(struct amdgpu_device *adev)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c index
> > 6cbf837d50dd..ec3ff4dec674 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v12_0.c
> > @@ -158,6 +158,7 @@ static void
> > gfxhub_v12_0_init_gart_aperture_regs(struct amdgpu_device *adev)>
> > static void gfxhub_v12_0_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > /* Program the AGP BAR */
> > WREG32_SOC15(GC, 0, regGCMC_VM_AGP_BASE, 0);
> >
> > @@ -184,8 +185,12 @@ static void
> > gfxhub_v12_0_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0,
regGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15_PREREG(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
> >
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c index
> > bfe247b1a333..27d7f7cb903f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> > @@ -91,6 +91,7 @@ static void gfxhub_v1_0_init_gart_aperture_regs(struct
> > amdgpu_device *adev)>
> > static void gfxhub_v1_0_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > if (!amdgpu_sriov_vf(adev) || adev->asic_type <= CHIP_VEGA10) {
> >
> > /* Program the AGP BAR */
> >
> > @@ -134,8 +135,12 @@ static void
> > gfxhub_v1_0_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0,
mmVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >>
44));
> >
> > - WREG32_FIELD15(GC, 0, VM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY,
1);
> > + tmp = RREG32_SOC15(GC, 0,
mmVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL2,
> > +
ENABLE_RETRY_FAULT_INTERRUPT, !adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL2,
tmp);
> >
> > }
> >
> > /* In the case squeezing vram into GART aperture, we don't use
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c index
> > fbdf46070b38..ed9a64bc5aaa 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
> > @@ -176,6 +176,8 @@ gfxhub_v1_2_xcc_init_system_aperture_regs(struct
> > amdgpu_device *adev,>
> > tmp = RREG32_SOC15(GC, GET_INST(GC, i),
> > regVM_L2_PROTECTION_FAULT_CNTL2);
> > tmp = REG_SET_FIELD(tmp,
VM_L2_PROTECTION_FAULT_CNTL2,
> >
> >
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> >
> > + tmp = REG_SET_FIELD(tmp,
VM_L2_PROTECTION_FAULT_CNTL2,
> > +
ENABLE_RETRY_FAULT_INTERRUPT, !adev->gmc.noretry);
> >
> > WREG32_SOC15(GC, GET_INST(GC, i),
regVM_L2_PROTECTION_FAULT_CNTL2,
> > tmp);
> >
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c index
> > 9ea593e2c719..152b2735d360 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
> > @@ -151,6 +151,7 @@ static void gfxhub_v2_0_init_gart_aperture_regs(struct
> > amdgpu_device *adev)>
> > static void gfxhub_v2_0_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > if (!amdgpu_sriov_vf(adev)) {
> >
> > /* Program the AGP BAR */
> >
> > @@ -178,8 +179,12 @@ static void
> > gfxhub_v2_0_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
> >
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c index
> > 30b90d35abd0..83c2ddbbd292 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
> > @@ -154,6 +154,7 @@ static void gfxhub_v2_1_init_gart_aperture_regs(struct
> > amdgpu_device *adev)>
> > static void gfxhub_v2_1_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > if (amdgpu_sriov_vf(adev))
> >
> > return;
> >
> > @@ -182,8 +183,12 @@ static void
> > gfxhub_v2_1_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
> >
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c index
> > 9e6a6e13dec0..90bbb2fe4884 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0.c
> > @@ -150,6 +150,7 @@ static void gfxhub_v3_0_init_gart_aperture_regs(struct
> > amdgpu_device *adev)>
> > static void gfxhub_v3_0_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > /* Program the AGP BAR */
> > WREG32_SOC15(GC, 0, regGCMC_VM_AGP_BASE, 0);
> >
> > @@ -176,8 +177,12 @@ static void
> > gfxhub_v3_0_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0,
regGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15_PREREG(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
> >
> > }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c
> > b/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c index
> > b3b1085c7cd3..1b3c067ab48c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v3_0_3.c
> > @@ -153,6 +153,7 @@ static void
> > gfxhub_v3_0_3_init_gart_aperture_regs(struct amdgpu_device *adev)>
> > static void gfxhub_v3_0_3_init_system_aperture_regs(struct amdgpu_device
> > *adev) {
> >
> > uint64_t value;
> >
> > + u32 tmp;
> >
> > if (amdgpu_sriov_vf(adev))
> >
> > return;
> >
> > @@ -181,8 +182,12 @@ static void
> > gfxhub_v3_0_3_init_system_aperture_regs(struct amdgpu_device *adev)>
> > WREG32_SOC15(GC, 0,
regGCVM_L2_PROTECTION_FAULT_DEFAULT_ADDR_HI32,
> >
> > (u32)((u64)adev->dummy_page_addr >> 44));
> >
> > - WREG32_FIELD15_PREREG(GC, 0, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > - ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = RREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > +
ACTIVE_PAGE_MIGRATION_PTE_READ_RETRY, 1);
> > + tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL2,
> > + ENABLE_RETRY_FAULT_INTERRUPT, !
adev->gmc.noretry);
> > + WREG32_SOC15(GC, 0, regGCVM_L2_PROTECTION_FAULT_CNTL2, tmp);
> >
> > }
next prev parent reply other threads:[~2026-06-16 11:54 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 11:45 [PATCH 0/7] drm/amdgpu: Improve retry fault handling (v2) Timur Kristóf
2026-05-25 11:45 ` [PATCH 1/7] drm/amdgpu: Use gmc->noretry instead of amdgpu_noretry directly Timur Kristóf
2026-05-25 11:45 ` [PATCH 2/7] drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as needed Timur Kristóf
2026-05-26 15:00 ` Alex Deucher
2026-05-25 11:45 ` [PATCH 3/7] drm/amdgpu/gmc: Don't compare page fault timestamps with other interrupts Timur Kristóf
2026-06-15 14:32 ` Tvrtko Ursulin
2026-06-15 14:52 ` Timur Kristóf
2026-06-15 15:23 ` Tvrtko Ursulin
2026-06-15 15:32 ` Timur Kristóf
2026-06-15 15:48 ` Tvrtko Ursulin
2026-06-16 10:15 ` Christian König
2026-06-16 11:17 ` Timur Kristóf
2026-06-16 12:48 ` Christian König
2026-05-25 11:45 ` [PATCH 4/7] drm/amdgpu/ih: Add retry_cam_ack IH function pointer Timur Kristóf
2026-06-15 14:44 ` Tvrtko Ursulin
2026-06-15 15:02 ` Timur Kristóf
2026-06-16 10:34 ` Christian König
2026-05-25 11:45 ` [PATCH 5/7] drm/amdgpu/gfxhub: Enable retry fault interrupts when needed Timur Kristóf
2026-06-16 8:02 ` Tvrtko Ursulin
2026-06-16 11:54 ` Timur Kristóf [this message]
2026-05-25 11:45 ` [PATCH 6/7] drm/amdgpu/gfxhub: Respect noretry flag for retry faults on GFX12.1 Timur Kristóf
2026-06-16 8:09 ` Tvrtko Ursulin
2026-06-16 11:57 ` Timur Kristóf
2026-06-16 12:16 ` Tvrtko Ursulin
2026-06-16 12:36 ` Timur Kristóf
2026-05-25 11:45 ` [PATCH 7/7] drm/amdgpu: Enable retry CAM on Navi 3 dGPUs Timur Kristóf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=10181145.eNJFYEL58v@timur-hyperion \
--to=timur.kristof@gmail.com \
--cc=Amir.Shetaia@amd.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=maraeo@gmail.com \
--cc=mario.limonciello@amd.com \
--cc=natalie.vock@gmx.de \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.