* [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8
@ 2025-11-26 13:29 Timur Kristóf
2025-11-26 13:29 ` [PATCH 01/10] drm/amdgpu/si_ih: Enable soft IRQ handler ring Timur Kristóf
` (10 more replies)
0 siblings, 11 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
Enable the soft IRQ handler ring on SI, CIK, VI and
delegate the processing of all VM faults to the soft
IRQ handler ring.
Why?
On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang. This is a concern
especially on SI and CIK where there are some HW limitations
regarding robustness features with some shader instructions,
which in practice means that users can see thousands of VM faults
during normal gaming use even when the game or the UMD don't do
anything wrong.
With this series, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs.
There are also a few misc improvements to the GMC v6 code.
Timur Kristóf (10):
drm/amdgpu/si_ih: Enable soft IRQ handler ring
drm/amdgpu/cik_ih: Enable soft IRQ handler ring
drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
drm/amdgpu/cz_ih: Enable soft IRQ handler ring
drm/amdgpu/gmc6: Don't print MC client as it's unknown
drm/amdgpu/gmc6: Cache VM fault info
drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 12 ++++++++++++
drivers/gpu/drm/amd/amdgpu/cz_ih.c | 10 ++++++++++
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 20 ++++++++++++++------
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 10 ++++++++++
drivers/gpu/drm/amd/amdgpu/si_ih.c | 12 ++++++++++++
drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 10 ++++++++++
8 files changed, 80 insertions(+), 6 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 01/10] drm/amdgpu/si_ih: Enable soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 02/10] drm/amdgpu/cik_ih: " Timur Kristóf
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
We are going to use the soft IRQ handler ring on GMC v6 (SI)
to process interrupts from VM faults.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/si_ih.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c b/drivers/gpu/drm/amd/amdgpu/si_ih.c
index 1df00f8a2406..66f650f87243 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c
@@ -96,6 +96,9 @@ static int si_ih_irq_init(struct amdgpu_device *adev)
pci_set_master(adev->pdev);
si_ih_enable_interrupts(adev);
+ if (adev->irq.ih_soft.ring_size)
+ adev->irq.ih_soft.enabled = true;
+
return 0;
}
@@ -112,6 +115,9 @@ static u32 si_ih_get_wptr(struct amdgpu_device *adev,
wptr = le32_to_cpu(*ih->wptr_cpu);
+ if (ih == &adev->irq.ih_soft)
+ goto out;
+
if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) {
wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK;
dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 0x%08X)\n",
@@ -127,6 +133,8 @@ static u32 si_ih_get_wptr(struct amdgpu_device *adev,
tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
WREG32(IH_RB_CNTL, tmp);
}
+
+out:
return (wptr & ih->ptr_mask);
}
@@ -175,6 +183,10 @@ static int si_ih_sw_init(struct amdgpu_ip_block *ip_block)
if (r)
return r;
+ r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, IH_SW_RING_SIZE, true);
+ if (r)
+ return r;
+
return amdgpu_irq_init(adev);
}
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 02/10] drm/amdgpu/cik_ih: Enable soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
2025-11-26 13:29 ` [PATCH 01/10] drm/amdgpu/si_ih: Enable soft IRQ handler ring Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 03/10] drm/amdgpu/iceland_ih: " Timur Kristóf
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
We are going to use the soft IRQ handler ring on GMC v7 (CIK)
to process interrupts from VM faults.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index 41f4705bdbbd..876a3256dba4 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -156,6 +156,9 @@ static int cik_ih_irq_init(struct amdgpu_device *adev)
/* enable irqs */
cik_ih_enable_interrupts(adev);
+ if (adev->irq.ih_soft.ring_size)
+ adev->irq.ih_soft.enabled = true;
+
return 0;
}
@@ -192,6 +195,9 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
wptr = le32_to_cpu(*ih->wptr_cpu);
+ if (ih == &adev->irq.ih_soft)
+ goto out;
+
if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) {
wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK;
/* When a ring buffer overflow happen start parsing interrupt
@@ -211,6 +217,8 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
WREG32(mmIH_RB_CNTL, tmp);
}
+
+out:
return (wptr & ih->ptr_mask);
}
@@ -306,6 +314,10 @@ static int cik_ih_sw_init(struct amdgpu_ip_block *ip_block)
if (r)
return r;
+ r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, IH_SW_RING_SIZE, true);
+ if (r)
+ return r;
+
r = amdgpu_irq_init(adev);
return r;
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 03/10] drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
2025-11-26 13:29 ` [PATCH 01/10] drm/amdgpu/si_ih: Enable soft IRQ handler ring Timur Kristóf
2025-11-26 13:29 ` [PATCH 02/10] drm/amdgpu/cik_ih: " Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 04/10] drm/amdgpu/tonga_ih: " Timur Kristóf
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
index 1317ede131b6..01cadf898c00 100644
--- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c
@@ -157,6 +157,9 @@ static int iceland_ih_irq_init(struct amdgpu_device *adev)
/* enable interrupts */
iceland_ih_enable_interrupts(adev);
+ if (adev->irq.ih_soft.ring_size)
+ adev->irq.ih_soft.enabled = true;
+
return 0;
}
@@ -194,6 +197,9 @@ static u32 iceland_ih_get_wptr(struct amdgpu_device *adev,
wptr = le32_to_cpu(*ih->wptr_cpu);
+ if (ih == &adev->irq.ih_soft)
+ goto out;
+
if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
goto out;
@@ -296,6 +302,10 @@ static int iceland_ih_sw_init(struct amdgpu_ip_block *ip_block)
if (r)
return r;
+ r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, IH_SW_RING_SIZE, true);
+ if (r)
+ return r;
+
r = amdgpu_irq_init(adev);
return r;
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 04/10] drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (2 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 03/10] drm/amdgpu/iceland_ih: " Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 05/10] drm/amdgpu/cz_ih: " Timur Kristóf
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
index 7d17ae56f901..ee8038df17e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c
@@ -159,6 +159,9 @@ static int tonga_ih_irq_init(struct amdgpu_device *adev)
/* enable interrupts */
tonga_ih_enable_interrupts(adev);
+ if (adev->irq.ih_soft.ring_size)
+ adev->irq.ih_soft.enabled = true;
+
return 0;
}
@@ -196,6 +199,9 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device *adev,
wptr = le32_to_cpu(*ih->wptr_cpu);
+ if (ih == &adev->irq.ih_soft)
+ goto out;
+
if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
goto out;
@@ -306,6 +312,10 @@ static int tonga_ih_sw_init(struct amdgpu_ip_block *ip_block)
if (r)
return r;
+ r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, IH_SW_RING_SIZE, true);
+ if (r)
+ return r;
+
adev->irq.ih.use_doorbell = true;
adev->irq.ih.doorbell_index = adev->doorbell_index.ih;
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 05/10] drm/amdgpu/cz_ih: Enable soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (3 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 04/10] drm/amdgpu/tonga_ih: " Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 06/10] drm/amdgpu/gmc6: Don't print MC client as it's unknown Timur Kristóf
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/cz_ih.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
index 2f891fb846d5..bc7a2e06ab5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
@@ -157,6 +157,9 @@ static int cz_ih_irq_init(struct amdgpu_device *adev)
/* enable interrupts */
cz_ih_enable_interrupts(adev);
+ if (adev->irq.ih_soft.ring_size)
+ adev->irq.ih_soft.enabled = true;
+
return 0;
}
@@ -194,6 +197,9 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev,
wptr = le32_to_cpu(*ih->wptr_cpu);
+ if (ih == &adev->irq.ih_soft)
+ goto out;
+
if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
goto out;
@@ -297,6 +303,10 @@ static int cz_ih_sw_init(struct amdgpu_ip_block *ip_block)
if (r)
return r;
+ r = amdgpu_ih_ring_init(adev, &adev->irq.ih_soft, IH_SW_RING_SIZE, true);
+ if (r)
+ return r;
+
r = amdgpu_irq_init(adev);
return r;
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 06/10] drm/amdgpu/gmc6: Don't print MC client as it's unknown
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (4 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 05/10] drm/amdgpu/cz_ih: " Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 07/10] drm/amdgpu/gmc6: Cache VM fault info Timur Kristóf
` (4 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
The VM_CONTEXT1_PROTECTION_FAULT_MCCLIENT register
doesn't exist on GMC v6 so we can't print the MC client as a
string like we do on GMC v7-v8. However, we still print the
mc_id from VM_CONTEXT1_PROTECTION_FAULT_STATUS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 499dfd78092d..f6715648b08a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -610,23 +610,21 @@ static void gmc_v6_0_gart_disable(struct amdgpu_device *adev)
}
static void gmc_v6_0_vm_decode_fault(struct amdgpu_device *adev,
- u32 status, u32 addr, u32 mc_client)
+ u32 status, u32 addr)
{
u32 mc_id;
u32 vmid = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS, VMID);
u32 protections = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS,
PROTECTIONS);
- char block[5] = { mc_client >> 24, (mc_client >> 16) & 0xff,
- (mc_client >> 8) & 0xff, mc_client & 0xff, 0 };
mc_id = REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS,
MEMORY_CLIENT_ID);
- dev_err(adev->dev, "VM fault (0x%02x, vmid %d) at page %u, %s from '%s' (0x%08x) (%d)\n",
+ dev_err(adev->dev, "VM fault (0x%02x, vmid %d) at page %u, %s from %d\n",
protections, vmid, addr,
REG_GET_FIELD(status, VM_CONTEXT1_PROTECTION_FAULT_STATUS,
MEMORY_CLIENT_RW) ?
- "write" : "read", block, mc_client, mc_id);
+ "write" : "read", mc_id);
}
static const u32 mc_cg_registers[] = {
@@ -1089,7 +1087,7 @@ static int gmc_v6_0_process_interrupt(struct amdgpu_device *adev,
addr);
dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n",
status);
- gmc_v6_0_vm_decode_fault(adev, status, addr, 0);
+ gmc_v6_0_vm_decode_fault(adev, status, addr);
}
return 0;
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 07/10] drm/amdgpu/gmc6: Cache VM fault info
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (5 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 06/10] drm/amdgpu/gmc6: Don't print MC client as it's unknown Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 08/10] drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring Timur Kristóf
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
Call amdgpu_vm_update_fault_cache on GMC v6 similarly to how we
do in GMC v7-v8 so that VM fault info can be used later by
userspace for debugging.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index f6715648b08a..bc6a74903f4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -1077,6 +1077,10 @@ static int gmc_v6_0_process_interrupt(struct amdgpu_device *adev,
if (!addr && !status)
return 0;
+ amdgpu_vm_update_fault_cache(adev, entry->pasid,
+ ((u64)addr) << AMDGPU_GPU_PAGE_SHIFT,
+ status, AMDGPU_GFXHUB(0));
+
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_FIRST)
gmc_v6_0_set_fault_enable_default(adev, false);
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 08/10] drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (6 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 07/10] drm/amdgpu/gmc6: Cache VM fault info Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 09/10] drm/amdgpu/gmc7: " Timur Kristóf
` (2 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.
Delegate the processing of all VM faults to the soft
IRQ handler ring.
As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index bc6a74903f4e..a8ec95f42926 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -1070,6 +1070,12 @@ static int gmc_v6_0_process_interrupt(struct amdgpu_device *adev,
{
u32 addr, status;
+ /* Delegate to the soft IRQ handler ring */
+ if (adev->irq.ih_soft.enabled && entry->ih != &adev->irq.ih_soft) {
+ amdgpu_irq_delegate(adev, entry, 4);
+ return 1;
+ }
+
addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR);
status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS);
WREG32_P(mmVM_CONTEXT1_CNTL2, 1, ~1);
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 09/10] drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (7 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 08/10] drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 13:29 ` [PATCH 10/10] drm/amdgpu/gmc8: " Timur Kristóf
2025-11-26 14:22 ` [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Christian König
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.
Delegate the processing of all VM faults to the soft
IRQ handler ring.
As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 0e5e54d0a9a5..fbd0bf147f50 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1261,6 +1261,12 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev,
{
u32 addr, status, mc_client, vmid;
+ /* Delegate to the soft IRQ handler ring */
+ if (adev->irq.ih_soft.enabled && entry->ih != &adev->irq.ih_soft) {
+ amdgpu_irq_delegate(adev, entry, 4);
+ return 1;
+ }
+
addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR);
status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS);
mc_client = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_MCCLIENT);
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 10/10] drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (8 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 09/10] drm/amdgpu/gmc7: " Timur Kristóf
@ 2025-11-26 13:29 ` Timur Kristóf
2025-11-26 14:22 ` [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Christian König
10 siblings, 0 replies; 14+ messages in thread
From: Timur Kristóf @ 2025-11-26 13:29 UTC (permalink / raw)
To: Christian König, Alex Deucher, amd-gfx; +Cc: Timur Kristóf
On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.
Delegate the processing of all VM faults to the soft
IRQ handler ring.
As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index e1509480dfc2..6551b60f2584 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1439,6 +1439,12 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
return 0;
}
+ /* Delegate to the soft IRQ handler ring */
+ if (adev->irq.ih_soft.enabled && entry->ih != &adev->irq.ih_soft) {
+ amdgpu_irq_delegate(adev, entry, 4);
+ return 1;
+ }
+
addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR);
status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS);
mc_client = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_MCCLIENT);
--
2.51.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
` (9 preceding siblings ...)
2025-11-26 13:29 ` [PATCH 10/10] drm/amdgpu/gmc8: " Timur Kristóf
@ 2025-11-26 14:22 ` Christian König
2025-11-26 14:46 ` Alex Deucher
10 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2025-11-26 14:22 UTC (permalink / raw)
To: Timur Kristóf, Alex Deucher, amd-gfx
On 11/26/25 14:29, Timur Kristóf wrote:
> Enable the soft IRQ handler ring on SI, CIK, VI and
> delegate the processing of all VM faults to the soft
> IRQ handler ring.
>
> Why?
>
> On old GPUs, it may be an issue that handling the interrupts from
> VM faults is too slow and the interrupt handler (IH) ring may
> overflow, which can cause an eventual hang. This is a concern
> especially on SI and CIK where there are some HW limitations
> regarding robustness features with some shader instructions,
> which in practice means that users can see thousands of VM faults
> during normal gaming use even when the game or the UMD don't do
> anything wrong.
>
> With this series, we spend much less time in the IRQ handler that
> interacts with the HW IH ring, which significantly reduces the
> chance of hangs.
>
> There are also a few misc improvements to the GMC v6 code.
Reviewed-by: Christian König <christian.koenig@amd.com> for the entire series.
@Alex do you want to pick that up for amd-staging-drm-next or should I?
Regards,
Christian.
>
> Timur Kristóf (10):
> drm/amdgpu/si_ih: Enable soft IRQ handler ring
> drm/amdgpu/cik_ih: Enable soft IRQ handler ring
> drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
> drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
> drm/amdgpu/cz_ih: Enable soft IRQ handler ring
> drm/amdgpu/gmc6: Don't print MC client as it's unknown
> drm/amdgpu/gmc6: Cache VM fault info
> drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
> drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
> drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
>
> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 12 ++++++++++++
> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 10 ++++++++++
> drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 20 ++++++++++++++------
> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++++++
> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++
> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 10 ++++++++++
> drivers/gpu/drm/amd/amdgpu/si_ih.c | 12 ++++++++++++
> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 10 ++++++++++
> 8 files changed, 80 insertions(+), 6 deletions(-)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8
2025-11-26 14:22 ` [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Christian König
@ 2025-11-26 14:46 ` Alex Deucher
2025-11-26 15:04 ` Christian König
0 siblings, 1 reply; 14+ messages in thread
From: Alex Deucher @ 2025-11-26 14:46 UTC (permalink / raw)
To: Christian König; +Cc: Timur Kristóf, Alex Deucher, amd-gfx
On Wed, Nov 26, 2025 at 9:29 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 11/26/25 14:29, Timur Kristóf wrote:
> > Enable the soft IRQ handler ring on SI, CIK, VI and
> > delegate the processing of all VM faults to the soft
> > IRQ handler ring.
> >
> > Why?
> >
> > On old GPUs, it may be an issue that handling the interrupts from
> > VM faults is too slow and the interrupt handler (IH) ring may
> > overflow, which can cause an eventual hang. This is a concern
> > especially on SI and CIK where there are some HW limitations
> > regarding robustness features with some shader instructions,
> > which in practice means that users can see thousands of VM faults
> > during normal gaming use even when the game or the UMD don't do
> > anything wrong.
> >
> > With this series, we spend much less time in the IRQ handler that
> > interacts with the HW IH ring, which significantly reduces the
> > chance of hangs.
> >
> > There are also a few misc improvements to the GMC v6 code.
>
> Reviewed-by: Christian König <christian.koenig@amd.com> for the entire series.
>
> @Alex do you want to pick that up for amd-staging-drm-next or should I?
I'll be off the next few days so if you can pick it up, that would be great.
Thanks,
Alex
>
> Regards,
> Christian.
>
> >
> > Timur Kristóf (10):
> > drm/amdgpu/si_ih: Enable soft IRQ handler ring
> > drm/amdgpu/cik_ih: Enable soft IRQ handler ring
> > drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
> > drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
> > drm/amdgpu/cz_ih: Enable soft IRQ handler ring
> > drm/amdgpu/gmc6: Don't print MC client as it's unknown
> > drm/amdgpu/gmc6: Cache VM fault info
> > drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
> > drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
> > drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
> >
> > drivers/gpu/drm/amd/amdgpu/cik_ih.c | 12 ++++++++++++
> > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 10 ++++++++++
> > drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 20 ++++++++++++++------
> > drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++++++
> > drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++
> > drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 10 ++++++++++
> > drivers/gpu/drm/amd/amdgpu/si_ih.c | 12 ++++++++++++
> > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 10 ++++++++++
> > 8 files changed, 80 insertions(+), 6 deletions(-)
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8
2025-11-26 14:46 ` Alex Deucher
@ 2025-11-26 15:04 ` Christian König
0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2025-11-26 15:04 UTC (permalink / raw)
To: Alex Deucher; +Cc: Timur Kristóf, Alex Deucher, amd-gfx
On 11/26/25 15:46, Alex Deucher wrote:
> On Wed, Nov 26, 2025 at 9:29 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 11/26/25 14:29, Timur Kristóf wrote:
>>> Enable the soft IRQ handler ring on SI, CIK, VI and
>>> delegate the processing of all VM faults to the soft
>>> IRQ handler ring.
>>>
>>> Why?
>>>
>>> On old GPUs, it may be an issue that handling the interrupts from
>>> VM faults is too slow and the interrupt handler (IH) ring may
>>> overflow, which can cause an eventual hang. This is a concern
>>> especially on SI and CIK where there are some HW limitations
>>> regarding robustness features with some shader instructions,
>>> which in practice means that users can see thousands of VM faults
>>> during normal gaming use even when the game or the UMD don't do
>>> anything wrong.
>>>
>>> With this series, we spend much less time in the IRQ handler that
>>> interacts with the HW IH ring, which significantly reduces the
>>> chance of hangs.
>>>
>>> There are also a few misc improvements to the GMC v6 code.
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com> for the entire series.
>>
>> @Alex do you want to pick that up for amd-staging-drm-next or should I?
>
> I'll be off the next few days so if you can pick it up, that would be great.
Going to take care of that tomorrow.
Thanks,
Christian.
>
> Thanks,
>
> Alex
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Timur Kristóf (10):
>>> drm/amdgpu/si_ih: Enable soft IRQ handler ring
>>> drm/amdgpu/cik_ih: Enable soft IRQ handler ring
>>> drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
>>> drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
>>> drm/amdgpu/cz_ih: Enable soft IRQ handler ring
>>> drm/amdgpu/gmc6: Don't print MC client as it's unknown
>>> drm/amdgpu/gmc6: Cache VM fault info
>>> drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
>>> drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
>>> drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
>>>
>>> drivers/gpu/drm/amd/amdgpu/cik_ih.c | 12 ++++++++++++
>>> drivers/gpu/drm/amd/amdgpu/cz_ih.c | 10 ++++++++++
>>> drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 20 ++++++++++++++------
>>> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++++++
>>> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++
>>> drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 10 ++++++++++
>>> drivers/gpu/drm/amd/amdgpu/si_ih.c | 12 ++++++++++++
>>> drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 10 ++++++++++
>>> 8 files changed, 80 insertions(+), 6 deletions(-)
>>>
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-26 15:05 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26 13:29 [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Timur Kristóf
2025-11-26 13:29 ` [PATCH 01/10] drm/amdgpu/si_ih: Enable soft IRQ handler ring Timur Kristóf
2025-11-26 13:29 ` [PATCH 02/10] drm/amdgpu/cik_ih: " Timur Kristóf
2025-11-26 13:29 ` [PATCH 03/10] drm/amdgpu/iceland_ih: " Timur Kristóf
2025-11-26 13:29 ` [PATCH 04/10] drm/amdgpu/tonga_ih: " Timur Kristóf
2025-11-26 13:29 ` [PATCH 05/10] drm/amdgpu/cz_ih: " Timur Kristóf
2025-11-26 13:29 ` [PATCH 06/10] drm/amdgpu/gmc6: Don't print MC client as it's unknown Timur Kristóf
2025-11-26 13:29 ` [PATCH 07/10] drm/amdgpu/gmc6: Cache VM fault info Timur Kristóf
2025-11-26 13:29 ` [PATCH 08/10] drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring Timur Kristóf
2025-11-26 13:29 ` [PATCH 09/10] drm/amdgpu/gmc7: " Timur Kristóf
2025-11-26 13:29 ` [PATCH 10/10] drm/amdgpu/gmc8: " Timur Kristóf
2025-11-26 14:22 ` [PATCH 00/10] drm/amdgpu: Improve page fault handling on GMC v6-8 Christian König
2025-11-26 14:46 ` Alex Deucher
2025-11-26 15:04 ` Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox