* (no subject)
@ 2025-01-08 13:59 Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
` (14 more replies)
0 siblings, 15 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
Recently we were testing suspend/resume functionality with AMD GPUs,
we have encountered several resource tracking related bugs, such as
double buffer free, use after free and unbalanced irq reference count.
We have tried to solve these issues case by case, but found that may
not be the right way. Especially about the unbalanced irq reference
count, there will be new issues appear once we fixed the current known
issues. After analyzing related source code, we found that there may be
some fundamental implementaion flaws behind these resource tracking
issues.
The amdgpu driver has two major state machines to driver the device
management flow, one is for ip blocks, the other is for ras blocks.
The hook points defined in struct amd_ip_funcs for device setup/teardown
are symmetric, but the implementation is asymmetric, sometime even
ambiguous. The most obvious two issues we noticed are:
1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
are called from .hw_fini() instead of .early_fini().
2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
match the way to set those flags.
When taking device suspend/resume into account, in addition to device
probe/remove, things get much more complex. Some issues arise because
many suspend/resume implementations directly reuse .hw_init/.hw_fini/
.late_init hook points.
So we try to fix those issues by two enhancements/refinements to current
device management state machines.
The first change is to make the ip block state machine and associated
status flags work in stack-like way as below:
Callback Status Flags
early_init: valid = true
sw_init: sw = true
hw_init: hw = true
late_init: late_initialized = true
early_fini: late_initialized = false
hw_fini: hw = false
sw_fini: sw = false
late_fini: valid = false
Also do the same thing for ras block state machine, though it's much
more simpler.
The second change is fine tune the overall device management work
flow as below:
1. amdgpu_driver_load_kms()
amdgpu_device_init()
amdgpu_device_ip_early_init()
ip_blocks[i].early_init()
ip_blocks[i].status.valid = true
amdgpu_device_ip_init()
amdgpu_ras_init()
ip_blocks[i].sw_init()
ip_blocks[i].status.sw = true
ip_blocks[i].hw_init()
ip_blocks[i].status.hw = true
amdgpu_device_ip_late_init()
ip_blocks[i].late_init()
ip_blocks[i].status.late_initialized = true
amdgpu_ras_late_init()
ras_blocks[i].ras_late_init()
amdgpu_ras_feature_enable_on_boot()
2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
amdgpu_device_suspend()
amdgpu_ras_early_fini()
ras_blocks[i].ras_early_fini()
amdgpu_ras_feature_disable()
amdgpu_ras_suspend()
amdgpu_ras_disable_all_features()
+++ ip_blocks[i].early_fini()
+++ ip_blocks[i].status.late_initialized = false
ip_blocks[i].suspend()
3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
amdgpu_device_resume()
amdgpu_device_ip_resume()
ip_blocks[i].resume()
amdgpu_device_ip_late_init()
ip_blocks[i].late_init()
ip_blocks[i].status.late_initialized = true
amdgpu_ras_late_init()
ras_blocks[i].ras_late_init()
amdgpu_ras_feature_enable_on_boot()
amdgpu_ras_resume()
amdgpu_ras_enable_all_features()
4. amdgpu_driver_unload_kms()
amdgpu_device_fini_hw()
amdgpu_ras_early_fini()
ras_blocks[i].ras_early_fini()
+++ ip_blocks[i].early_fini()
+++ ip_blocks[i].status.late_initialized = false
ip_blocks[i].hw_fini()
ip_blocks[i].status.hw = false
5. amdgpu_driver_release_kms()
amdgpu_device_fini_sw()
amdgpu_device_ip_fini()
ip_blocks[i].sw_fini()
ip_blocks[i].status.sw = false
--- ip_blocks[i].status.valid = false
+++ amdgpu_ras_fini()
ip_blocks[i].late_fini()
+++ ip_blocks[i].status.valid = false
--- ip_blocks[i].status.late_initialized = false
--- amdgpu_ras_fini()
The main changes include:
1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
Currently there's only one ip block which provides `early_fini`
callback. We have add a check of `in_s3` to keep current behavior in
function amdgpu_dm_early_fini(). So there should be no functional
changes.
2) set ip_blocks[i].status.late_initialized to false after calling
callback `early_fini`. We have auditted all usages of the
late_initialized flag and no functional changes found.
3) only set ip_blocks[i].status.valid = false after calling the
`late_fini` callback.
4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
Then we try to refine each subsystem, such as nbio, asic, gfx, gmc,
ras etc, to follow the new design. Currently we have only taken the
nbio and asic as examples to show the proposed changes. Once we have
confirmed that's the right way to go, we will handle the lefting
subsystems.
This is in early stage and requesting for comments, any comments and
suggestions are welcomed!
Jiang Liu (13):
amdgpu: wrong array index to get ip block for PSP
drm/admgpu: add helper functions to track status for ras manager
drm/amdgpu: add a flag to track ras debugfs creation status
drm/amdgpu: free all resources on error recovery path of
amdgpu_ras_init()
drm/amdgpu: introduce a flag to track refcount held for features
drm/amdgpu: enhance amdgpu_ras_block_late_fini()
drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
drm/amdgpu: make IP block state machine works in stack like way
drm/admgpu: make device state machine work in stack like way
drm/amdgpu/sdma: improve the way to manage irq reference count
drm/amdgpu/nbio: improve the way to manage irq reference count
drm/amdgpu/asic: make ip block operations symmetric by .early_fini()
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +-
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 +
drivers/gpu/drm/amd/amdgpu/nv.c | 14 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 -
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +--
drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++---
drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++--
drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++-
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +
25 files changed, 326 insertions(+), 118 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP
2025-01-08 13:59 Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 15:02 ` Alex Deucher
2025-01-08 13:59 ` [RFC PATCH 02/13] drm/admgpu: add helper functions to track status for ras manager Jiang Liu
` (13 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx,
instead we should call amdgpu_device_ip_get_ip_block() to get the
corresponding IP block oject.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 0b1e280ee228..25c06c6c8a2a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -3830,10 +3830,12 @@ static ssize_t psp_usbc_pd_fw_sysfs_read(struct device *dev,
{
struct drm_device *ddev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(ddev);
+ struct amdgpu_ip_block *ip_block;
uint32_t fw_ver;
int ret;
- if (!adev->ip_blocks[AMD_IP_BLOCK_TYPE_PSP].status.late_initialized) {
+ ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_PSP);
+ if (ip_block == NULL || !ip_block->status.late_initialized) {
dev_info(adev->dev, "PSP block is not ready yet\n.");
return -EBUSY;
}
@@ -3862,8 +3864,10 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev,
struct amdgpu_bo *fw_buf_bo = NULL;
uint64_t fw_pri_mc_addr;
void *fw_pri_cpu_addr;
+ struct amdgpu_ip_block *ip_block;
- if (!adev->ip_blocks[AMD_IP_BLOCK_TYPE_PSP].status.late_initialized) {
+ ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_PSP);
+ if (ip_block == NULL || !ip_block->status.late_initialized) {
dev_err(adev->dev, "PSP block is not ready yet.");
return -EBUSY;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 02/13] drm/admgpu: add helper functions to track status for ras manager
2025-01-08 13:59 Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status Jiang Liu
` (12 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Add helper functions to track status for ras manager and ip blocks.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 38 +++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 38 +++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 10 +++++++
3 files changed, 86 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e4b13e729770..32941f29507c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -367,12 +367,29 @@ bool amdgpu_device_ip_is_valid(struct amdgpu_device *adev,
#define AMDGPU_MAX_IP_NUM 16
+enum amdgpu_marker {
+ // shared by ip blocks and ras blocks
+ AMDGPU_MARKER_IRQ0 = 0,
+ AMDGPU_MARKER_IRQ1 = 1,
+ AMDGPU_MARKER_IRQ2 = 2,
+ AMDGPU_MARKER_IRQ3 = 3,
+ AMDGPU_MARKER_IRQ4 = 4,
+ AMDGPU_MARKER_IRQ5 = 5,
+ AMDGPU_MARKER_IRQ6 = 6,
+ AMDGPU_MARKER_IRQ7 = 7,
+ AMDGPU_MARKER_IRQ_MAX = 47,
+ AMDGPU_MARKER_DEBUGFS = 63,
+};
+
+#define AMDGPU_MARKER_INDEX_IRQ(idx) (AMDGPU_MARKER_INDEX_IRQ0 + (idx))
+
struct amdgpu_ip_block_status {
bool valid;
bool sw;
bool hw;
bool late_initialized;
bool hang;
+ uint64_t markers;
};
struct amdgpu_ip_block_version {
@@ -400,6 +417,27 @@ amdgpu_device_ip_get_ip_block(struct amdgpu_device *adev,
int amdgpu_device_ip_block_add(struct amdgpu_device *adev,
const struct amdgpu_ip_block_version *ip_block_version);
+static inline void amdgpu_ip_block_set_marker(struct amdgpu_ip_block *ip_block,
+ enum amdgpu_marker marker)
+{
+ WARN_ON(ip_block->status.markers & (0x1ull << marker));
+ ip_block->status.markers |= 0x1ull << (int)marker;
+}
+
+static inline bool amdgpu_ip_block_test_and_clear_marker(struct amdgpu_ip_block *ip_block,
+ enum amdgpu_marker marker)
+{
+ bool set = false;
+ uint64_t value = 0x1ull << (int)marker;
+
+ if ((ip_block->status.markers & value) != 0 ) {
+ ip_block->status.markers &= ~value;
+ set = true;
+ }
+
+ return set;
+}
+
/*
* BIOS.
*/
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 71e8eafbbfbc..6d52e22691f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -5007,3 +5007,41 @@ bool amdgpu_ras_is_rma(struct amdgpu_device *adev)
return con->is_rma;
}
+
+bool amdgpu_ras_test_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head, int marker)
+{
+ struct ras_manager *obj = amdgpu_ras_find_obj(adev, head);
+
+ if (obj && obj->markers & (0x1ull << marker)) {
+ return true;
+ }
+
+ return false;
+}
+
+void amdgpu_ras_set_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head, int marker)
+{
+ struct ras_manager *obj = amdgpu_ras_find_obj(adev, head);
+
+ WARN_ON(obj->markers & (0x1ull << marker));
+ if (obj) {
+ obj->markers |= 0x1ull << marker;
+ }
+}
+
+bool amdgpu_ras_test_and_clear_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head, int marker)
+{
+ bool set = false;
+ uint64_t value = 0x1ull << marker;
+ struct ras_manager *obj = amdgpu_ras_find_obj(adev, head);
+
+ if (obj && (obj->markers & value) != 0 ) {
+ obj->markers &= ~value;
+ set = true;
+ }
+
+ return set;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index b13debcf48ee..bc7377eaf819 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -632,6 +632,8 @@ struct ras_manager {
struct ras_common_if head;
/* reference count */
int use;
+ /* Flags for status tracking */
+ uint64_t markers;
/* ras block link */
struct list_head node;
/* the device */
@@ -975,4 +977,12 @@ void amdgpu_ras_event_log_print(struct amdgpu_device *adev, u64 event_id,
const char *fmt, ...);
bool amdgpu_ras_is_rma(struct amdgpu_device *adev);
+
+bool amdgpu_ras_test_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head, int marker);
+void amdgpu_ras_set_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head, int marker);
+bool amdgpu_ras_test_and_clear_marker(struct amdgpu_device *adev,
+ struct ras_common_if *head,
+ int marker);
#endif
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status
2025-01-08 13:59 Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 02/13] drm/admgpu: add helper functions to track status for ras manager Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 17:19 ` Mario Limonciello
2025-01-08 13:59 ` [RFC PATCH 04/13] drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init() Jiang Liu
` (11 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Add a flag to track ras debugfs creation status, to avoid possible
incorrect reference count management for ras block object in function
amdgpu_ras_aca_is_supported().
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 32941f29507c..2ef7d3102be3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -378,7 +378,7 @@ enum amdgpu_marker {
AMDGPU_MARKER_IRQ6 = 6,
AMDGPU_MARKER_IRQ7 = 7,
AMDGPU_MARKER_IRQ_MAX = 47,
- AMDGPU_MARKER_DEBUGFS = 63,
+ AMDGPU_MARKER_RAS_DEBUGFS = 63,
};
#define AMDGPU_MARKER_INDEX_IRQ(idx) (AMDGPU_MARKER_INDEX_IRQ0 + (idx))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 6d52e22691f7..efd72b07a185 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1996,7 +1996,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
{
struct ras_manager *obj = amdgpu_ras_find_obj(adev, &head->head);
- if (!obj || !dir)
+ if (!obj || !dir ||
+ amdgpu_ras_test_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS))
return;
get_obj(obj);
@@ -2007,6 +2008,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
debugfs_create_file(obj->fs_data.debugfs_name, S_IWUGO | S_IRUGO, dir,
obj, &amdgpu_ras_debugfs_ops);
+
+ amdgpu_ras_set_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS);
}
static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev)
@@ -2134,7 +2137,9 @@ static int amdgpu_ras_fs_fini(struct amdgpu_device *adev)
if (IS_ENABLED(CONFIG_DEBUG_FS)) {
list_for_each_entry_safe(con_obj, tmp, &con->head, node) {
ip_obj = amdgpu_ras_find_obj(adev, &con_obj->head);
- if (ip_obj)
+ if (ip_obj &&
+ amdgpu_ras_test_and_clear_marker(adev, &ip_obj->head,
+ AMDGPU_MARKER_RAS_DEBUGFS))
put_obj(ip_obj);
}
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 04/13] drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init()
2025-01-08 13:59 Jiang Liu
` (2 preceding siblings ...)
2025-01-08 13:59 ` [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 05/13] drm/amdgpu: introduce a flag to track refcount held for features Jiang Liu
` (10 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Free all allocated resources on error recovery path in function
amdgpu_ras_init().
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index efd72b07a185..5e8838ffccaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3670,6 +3670,7 @@ static void amdgpu_ras_init_reserved_vram_size(struct amdgpu_device *adev)
int amdgpu_ras_init(struct amdgpu_device *adev)
{
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+ struct amdgpu_ras_block_list *ras_node, *tmp;
int r;
if (con)
@@ -3747,20 +3748,20 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
* to handle fatal error */
r = amdgpu_nbio_ras_sw_init(adev);
if (r)
- return r;
+ goto release_con;
if (adev->nbio.ras &&
adev->nbio.ras->init_ras_controller_interrupt) {
r = adev->nbio.ras->init_ras_controller_interrupt(adev);
if (r)
- goto release_con;
+ goto free_blocks;
}
if (adev->nbio.ras &&
adev->nbio.ras->init_ras_err_event_athub_interrupt) {
r = adev->nbio.ras->init_ras_err_event_athub_interrupt(adev);
if (r)
- goto release_con;
+ goto free_blocks;
}
/* Packed socket_id to ras feature mask bits[31:29] */
@@ -3776,7 +3777,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (amdgpu_ras_fs_init(adev)) {
r = -EINVAL;
- goto release_con;
+ goto free_blocks;
}
if (amdgpu_ras_aca_is_supported(adev)) {
@@ -3785,7 +3786,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
else
r = amdgpu_mca_init(adev);
if (r)
- goto release_con;
+ goto clear_ras_fs;
}
dev_info(adev->dev, "RAS INFO: ras initialized successfully, "
@@ -3793,6 +3794,14 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
adev->ras_hw_enabled, adev->ras_enabled);
return 0;
+
+clear_ras_fs:
+ amdgpu_ras_fs_fini(adev);
+free_blocks:
+ list_for_each_entry_safe(ras_node, tmp, &adev->ras_list, node) {
+ list_del(&ras_node->node);
+ kfree(ras_node);
+ }
release_con:
amdgpu_ras_set_context(adev, NULL);
kfree(con);
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 05/13] drm/amdgpu: introduce a flag to track refcount held for features
2025-01-08 13:59 Jiang Liu
` (3 preceding siblings ...)
2025-01-08 13:59 ` [RFC PATCH 04/13] drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init() Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 06/13] drm/amdgpu: enhance amdgpu_ras_block_late_fini() Jiang Liu
` (9 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Currently we track the refcount on ras block object for features by
checking `if (obj && amdgpu_ras_is_feature_enabled(adev, head))`,
which is a little unreliable. So introduce a dedicated flag to track
the reference count.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2ef7d3102be3..fa19c5391d8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -378,6 +378,8 @@ enum amdgpu_marker {
AMDGPU_MARKER_IRQ6 = 6,
AMDGPU_MARKER_IRQ7 = 7,
AMDGPU_MARKER_IRQ_MAX = 47,
+ // used for ras blocks
+ AMDGPU_MARKER_RAS_FEATURE = 62,
AMDGPU_MARKER_RAS_DEBUGFS = 63,
};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 5e8838ffccaa..41978116b92b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -774,15 +774,20 @@ static int __amdgpu_ras_feature_enable(struct amdgpu_device *adev,
obj = amdgpu_ras_create_obj(adev, head);
if (!obj)
return -EINVAL;
- } else {
+ amdgpu_ras_set_marker(adev, head, AMDGPU_MARKER_RAS_FEATURE);
+ } else if (!amdgpu_ras_test_marker(adev, head,
+ AMDGPU_MARKER_RAS_FEATURE)) {
/* In case we create obj somewhere else */
get_obj(obj);
+ amdgpu_ras_set_marker(adev, head, AMDGPU_MARKER_RAS_FEATURE);
}
con->features |= BIT(head->block);
} else {
if (obj && amdgpu_ras_is_feature_enabled(adev, head)) {
con->features &= ~BIT(head->block);
- put_obj(obj);
+ if (amdgpu_ras_test_and_clear_marker(adev, head,
+ AMDGPU_MARKER_RAS_FEATURE))
+ put_obj(obj);
}
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 06/13] drm/amdgpu: enhance amdgpu_ras_block_late_fini()
2025-01-08 13:59 Jiang Liu
` (4 preceding siblings ...)
2025-01-08 13:59 ` [RFC PATCH 05/13] drm/amdgpu: introduce a flag to track refcount held for features Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 07/13] drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR Jiang Liu
` (8 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Enhance amdgpu_ras_block_late_fini() to revert what has been done
by amdgpu_ras_block_late_init(), and fix a possible resource leakage
in function amdgpu_ras_block_late_init().
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 41978116b92b..040969d56541 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3898,13 +3898,13 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev,
ras_obj->hw_ops->query_ras_error_status)) {
r = amdgpu_ras_sysfs_create(adev, ras_block);
if (r)
- goto interrupt;
+ goto cleanup;
/* Those are the cached values at init.
*/
query_info = kzalloc(sizeof(*query_info), GFP_KERNEL);
if (!query_info)
- return -ENOMEM;
+ goto cleanup;
memcpy(&query_info->head, ras_block, sizeof(struct ras_common_if));
if (amdgpu_ras_query_error_count(adev, &ce_count, &ue_count, query_info) == 0) {
@@ -3917,11 +3917,8 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev,
return 0;
-interrupt:
- if (ras_obj->ras_cb)
- amdgpu_ras_interrupt_remove_handler(adev, ras_block);
cleanup:
- amdgpu_ras_feature_enable(adev, ras_block, 0);
+ amdgpu_ras_block_late_fini(adev, ras_block);
return r;
}
@@ -3936,9 +3933,16 @@ void amdgpu_ras_block_late_fini(struct amdgpu_device *adev,
struct ras_common_if *ras_block)
{
struct amdgpu_ras_block_object *ras_obj;
+
if (!ras_block)
return;
+ amdgpu_ras_feature_enable(adev, ras_block, 0);
+
+ /* in resume/reset phase, no need to delete ras fs node */
+ if (adev->in_suspend || amdgpu_in_reset(adev))
+ return;
+
amdgpu_ras_sysfs_remove(adev, ras_block);
ras_obj = container_of(ras_block, struct amdgpu_ras_block_object, ras_comm);
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 07/13] drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
2025-01-08 13:59 Jiang Liu
` (5 preceding siblings ...)
2025-01-08 13:59 ` [RFC PATCH 06/13] drm/amdgpu: enhance amdgpu_ras_block_late_fini() Jiang Liu
@ 2025-01-08 13:59 ` Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 08/13] drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() Jiang Liu
` (7 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Enhance amdgpu_ras_pre_fini() to better support suspend/resume by:
1) fix possible resource leakage. amdgpu_release_ras_context() only
kfree(con) but doesn't release resources associated with the con
object.
2) call amdgpu_ras_pre_fini() in amdgpu_device_suspend() to undo what
has been done by amdgpu_ras_late_init(), because amdgpu_device_resume()
will invoke amdgpu_ras_late_init() on resume.
3) move amdgpu_ras_recovery_fini() from amdgpu_ras_pre_fini() to
amdgpu_ras_fini()
4) move calling of `obj->ras_fini()` from amdgpu_ras_fini() to
amdgpu_ras_pre_fini().
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 45 ++++++++++++++--------
2 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 602322bd7cb8..7ee8f9d73c3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4599,6 +4599,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return 0;
release_ras_con:
+ amdgpu_ras_pre_fini(adev);
+ amdgpu_ras_fini(adev);
if (amdgpu_sriov_vf(adev))
amdgpu_virt_release_full_gpu(adev, true);
@@ -4613,8 +4615,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->virt.ops = NULL;
r = -EAGAIN;
}
- amdgpu_release_ras_context(adev);
-
failed:
amdgpu_vf_error_trans_all(adev);
@@ -4875,6 +4875,8 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
cancel_delayed_work_sync(&adev->delayed_init_work);
+ /* disable ras feature must before hw fini */
+ amdgpu_ras_pre_fini(adev);
amdgpu_ras_suspend(adev);
amdgpu_device_ip_suspend_phase1(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 040969d56541..bc90035ee25b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -4064,42 +4064,50 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev)
int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
{
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+ struct amdgpu_ras_block_list *node, *tmp;
+ struct amdgpu_ras_block_object *obj;
- if (!adev->ras_enabled || !con)
- return 0;
+ if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_ras_telemetry_en(adev))
+ goto disable;
+ list_for_each_entry_safe(node, tmp, &adev->ras_list, node) {
+ obj = node->ras_obj;
+ if (!obj) {
+ continue;
+ }
+
+ if (!amdgpu_ras_is_supported(adev, obj->ras_comm.block))
+ continue;
+
+ if (obj->ras_fini) {
+ obj->ras_fini(adev, &obj->ras_comm);
+ } else
+ amdgpu_ras_block_late_fini_default(adev, &obj->ras_comm);
+ }
+disable:
/* Need disable ras on all IPs here before ip [hw/sw]fini */
- if (AMDGPU_RAS_GET_FEATURES(con->features))
+ if (con && AMDGPU_RAS_GET_FEATURES(con->features))
amdgpu_ras_disable_all_features(adev, 0);
- amdgpu_ras_recovery_fini(adev);
+
return 0;
}
int amdgpu_ras_fini(struct amdgpu_device *adev)
{
struct amdgpu_ras_block_list *ras_node, *tmp;
- struct amdgpu_ras_block_object *obj = NULL;
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
if (!adev->ras_enabled || !con)
- return 0;
+ goto out_free_context;
list_for_each_entry_safe(ras_node, tmp, &adev->ras_list, node) {
- if (ras_node->ras_obj) {
- obj = ras_node->ras_obj;
- if (amdgpu_ras_is_supported(adev, obj->ras_comm.block) &&
- obj->ras_fini)
- obj->ras_fini(adev, &obj->ras_comm);
- else
- amdgpu_ras_block_late_fini_default(adev, &obj->ras_comm);
- }
-
/* Clear ras blocks from ras_list and free ras block list node */
list_del(&ras_node->node);
kfree(ras_node);
}
+ amdgpu_ras_recovery_fini(adev);
amdgpu_ras_fs_fini(adev);
amdgpu_ras_interrupt_remove_all(adev);
@@ -4117,8 +4125,11 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
cancel_delayed_work_sync(&con->ras_counte_delay_work);
- amdgpu_ras_set_context(adev, NULL);
- kfree(con);
+out_free_context:
+ if (con) {
+ amdgpu_ras_set_context(adev, NULL);
+ kfree(con);
+ }
return 0;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 08/13] drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
2025-01-08 13:59 Jiang Liu
` (6 preceding siblings ...)
2025-01-08 13:59 ` [RFC PATCH 07/13] drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way Jiang Liu
` (6 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini(), to keep same
style with other code.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++++++++--------
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +-
12 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7ee8f9d73c3c..0e69c7c7fe1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4599,7 +4599,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return 0;
release_ras_con:
- amdgpu_ras_pre_fini(adev);
+ amdgpu_ras_early_fini(adev);
amdgpu_ras_fini(adev);
if (amdgpu_sriov_vf(adev))
amdgpu_virt_release_full_gpu(adev, true);
@@ -4690,7 +4690,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_reg_state_sysfs_fini(adev);
/* disable ras feature must before hw fini */
- amdgpu_ras_pre_fini(adev);
+ amdgpu_ras_early_fini(adev);
amdgpu_ttm_set_buffer_funcs_status(adev, false);
@@ -4876,7 +4876,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
cancel_delayed_work_sync(&adev->delayed_init_work);
/* disable ras feature must before hw fini */
- amdgpu_ras_pre_fini(adev);
+ amdgpu_ras_early_fini(adev);
amdgpu_ras_suspend(adev);
amdgpu_device_ip_suspend_phase1(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 8c9fcfb7f22e..84834892236f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -899,7 +899,7 @@ int amdgpu_gfx_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
index 0a6397e3b8a7..c6b3777c4792 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
@@ -300,7 +300,7 @@ int amdgpu_jpeg_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
index d085687a47ea..c75ce91f94ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
@@ -71,6 +71,6 @@ int amdgpu_nbio_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index bc90035ee25b..5335942d3cdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3918,7 +3918,7 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev,
return 0;
cleanup:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
@@ -3929,7 +3929,7 @@ static int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev,
}
/* helper function to remove ras fs node and interrupt handler */
-void amdgpu_ras_block_late_fini(struct amdgpu_device *adev,
+void amdgpu_ras_block_early_fini(struct amdgpu_device *adev,
struct ras_common_if *ras_block)
{
struct amdgpu_ras_block_object *ras_obj;
@@ -3950,10 +3950,10 @@ void amdgpu_ras_block_late_fini(struct amdgpu_device *adev,
amdgpu_ras_interrupt_remove_handler(adev, ras_block);
}
-static void amdgpu_ras_block_late_fini_default(struct amdgpu_device *adev,
+static void amdgpu_ras_block_early_fini_default(struct amdgpu_device *adev,
struct ras_common_if *ras_block)
{
- return amdgpu_ras_block_late_fini(adev, ras_block);
+ return amdgpu_ras_block_early_fini(adev, ras_block);
}
/* do some init work after IP late init as dependence.
@@ -4061,7 +4061,7 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev)
}
/* do some fini work before IP fini as dependence */
-int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
+int amdgpu_ras_early_fini(struct amdgpu_device *adev)
{
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
struct amdgpu_ras_block_list *node, *tmp;
@@ -4079,10 +4079,10 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
if (!amdgpu_ras_is_supported(adev, obj->ras_comm.block))
continue;
- if (obj->ras_fini) {
- obj->ras_fini(adev, &obj->ras_comm);
+ if (obj->ras_early_fini) {
+ obj->ras_early_fini(adev, &obj->ras_comm);
} else
- amdgpu_ras_block_late_fini_default(adev, &obj->ras_comm);
+ amdgpu_ras_block_early_fini_default(adev, &obj->ras_comm);
}
disable:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index bc7377eaf819..56665487be2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -709,7 +709,7 @@ struct amdgpu_ras_block_object {
int (*ras_block_match)(struct amdgpu_ras_block_object *block_obj,
enum amdgpu_ras_block block, uint32_t sub_block_index);
int (*ras_late_init)(struct amdgpu_device *adev, struct ras_common_if *ras_block);
- void (*ras_fini)(struct amdgpu_device *adev, struct ras_common_if *ras_block);
+ void (*ras_early_fini)(struct amdgpu_device *adev, struct ras_common_if *ras_block);
ras_ih_cb ras_cb;
const struct amdgpu_ras_block_hw_ops *hw_ops;
};
@@ -823,13 +823,13 @@ amdgpu_ras_error_to_ta(enum amdgpu_ras_error_type error) {
/* called in ip_init and ip_fini */
int amdgpu_ras_init(struct amdgpu_device *adev);
int amdgpu_ras_late_init(struct amdgpu_device *adev);
+int amdgpu_ras_early_fini(struct amdgpu_device *adev);
int amdgpu_ras_fini(struct amdgpu_device *adev);
-int amdgpu_ras_pre_fini(struct amdgpu_device *adev);
int amdgpu_ras_block_late_init(struct amdgpu_device *adev,
struct ras_common_if *ras_block);
-void amdgpu_ras_block_late_fini(struct amdgpu_device *adev,
+void amdgpu_ras_block_early_fini(struct amdgpu_device *adev,
struct ras_common_if *ras_block);
int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 183a976ba29d..21938e858d55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -116,7 +116,7 @@ int amdgpu_sdma_ras_late_init(struct amdgpu_device *adev,
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index b9fec295b9a6..31a8a12420cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -339,7 +339,7 @@ int amdgpu_umc_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index aecb78e0519f..b2cf68db650c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -1239,7 +1239,7 @@ int amdgpu_vcn_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 8e697273d2ac..99722b5be315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -1122,7 +1122,7 @@ static int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev, struct ras_comm
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 4d5b48dbc005..a0a1af0ffc54 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -5062,7 +5062,7 @@ static int gfx_v9_4_3_ras_late_init(struct amdgpu_device *adev, struct ras_commo
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
index b01bb759d0f4..4beec0b980b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
@@ -790,7 +790,7 @@ static int mmhub_v1_8_ras_late_init(struct amdgpu_device *adev, struct ras_commo
return 0;
late_fini:
- amdgpu_ras_block_late_fini(adev, ras_block);
+ amdgpu_ras_block_early_fini(adev, ras_block);
return r;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way
2025-01-08 13:59 Jiang Liu
` (7 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 08/13] drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 17:04 ` Mario Limonciello
2025-01-08 14:00 ` [RFC PATCH 10/13] drm/admgpu: make device state machine work " Jiang Liu
` (5 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
There are some mismatch between IP block state machine and its associated
status flags, especially about the meaning of `status.late_initialized`.
So let's make the state machine and associated status flas work in stack-like
way as below:
Callback Status
early_init: valid = true
sw_init: sw = true
hw_init: hw = true
late_init: late_initialized = true
early_fini: late_initialized = false
hw_fini: hw = false
sw_fini: sw = false
late_fini: valid = false
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e69c7c7fe1f..36a33a391411 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3332,6 +3332,8 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
DRM_DEBUG("early_fini of IP block <%s> failed %d\n",
adev->ip_blocks[i].version->funcs->name, r);
}
+
+ adev->ip_blocks[i].status.late_initialized = false;
}
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
@@ -3407,15 +3409,14 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
adev->ip_blocks[i].version->funcs->name, r);
}
adev->ip_blocks[i].status.sw = false;
- adev->ip_blocks[i].status.valid = false;
}
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
- if (!adev->ip_blocks[i].status.late_initialized)
+ if (!adev->ip_blocks[i].status.valid)
continue;
if (adev->ip_blocks[i].version->funcs->late_fini)
adev->ip_blocks[i].version->funcs->late_fini(&adev->ip_blocks[i]);
- adev->ip_blocks[i].status.late_initialized = false;
+ adev->ip_blocks[i].status.valid = false;
}
amdgpu_ras_fini(adev);
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 10/13] drm/admgpu: make device state machine work in stack like way
2025-01-08 13:59 Jiang Liu
` (8 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 17:13 ` Mario Limonciello
2025-01-08 14:00 ` [RFC PATCH 11/13] drm/amdgpu/sdma: improve the way to manage irq reference count Jiang Liu
` (4 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Make the device state machine work in stack like way to better support
suspend/resume by following changes:
1. amdgpu_driver_load_kms()
amdgpu_device_init()
amdgpu_device_ip_early_init()
ip_blocks[i].early_init()
ip_blocks[i].status.valid = true
amdgpu_device_ip_init()
amdgpu_ras_init()
ip_blocks[i].sw_init()
ip_blocks[i].status.sw = true
ip_blocks[i].hw_init()
ip_blocks[i].status.hw = true
amdgpu_device_ip_late_init()
ip_blocks[i].late_init()
ip_blocks[i].status.late_initialized = true
amdgpu_ras_late_init()
ras_blocks[i].ras_late_init()
amdgpu_ras_feature_enable_on_boot()
2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
amdgpu_device_suspend()
amdgpu_ras_early_fini()
ras_blocks[i].ras_early_fini()
amdgpu_ras_feature_disable()
amdgpu_ras_suspend()
amdgpu_ras_disable_all_features()
+++ ip_blocks[i].early_fini()
+++ ip_blocks[i].status.late_initialized = false
ip_blocks[i].suspend()
3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
amdgpu_device_resume()
amdgpu_device_ip_resume()
ip_blocks[i].resume()
amdgpu_device_ip_late_init()
ip_blocks[i].late_init()
ip_blocks[i].status.late_initialized = true
amdgpu_ras_late_init()
ras_blocks[i].ras_late_init()
amdgpu_ras_feature_enable_on_boot()
amdgpu_ras_resume()
amdgpu_ras_enable_all_features()
4. amdgpu_driver_unload_kms()
amdgpu_device_fini_hw()
amdgpu_ras_early_fini()
ras_blocks[i].ras_early_fini()
+++ ip_blocks[i].early_fini()
+++ ip_blocks[i].status.late_initialized = false
ip_blocks[i].hw_fini()
ip_blocks[i].status.hw = false
5. amdgpu_driver_release_kms()
amdgpu_device_fini_sw()
amdgpu_device_ip_fini()
ip_blocks[i].sw_fini()
ip_blocks[i].status.sw = false
--- ip_blocks[i].status.valid = false
+++ amdgpu_ras_fini()
ip_blocks[i].late_fini()
+++ ip_blocks[i].status.valid = false
--- ip_blocks[i].status.late_initialized = false
--- amdgpu_ras_fini()
The main changes include:
1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
Currently there's only one ip block which provides `early_fini`
callback. We have add a check of `in_s3` to keep current behavior in
function amdgpu_dm_early_fini(). So there should be no functional
changes.
2) set ip_blocks[i].status.late_initialized to false after calling
callback `early_fini`. We have auditted all usages of the
late_initialized flag and no functional changes found.
3) only set ip_blocks[i].status.valid = false after calling the
`late_fini` callback.
4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
There's one more task left to analyze GPU reset related state machine
transitions.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 +++++++++++++++++--
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
2 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 36a33a391411..5c6b39e5cdaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3411,6 +3411,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
adev->ip_blocks[i].status.sw = false;
}
+ amdgpu_ras_fini(adev);
+
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
if (!adev->ip_blocks[i].status.valid)
continue;
@@ -3419,8 +3421,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
adev->ip_blocks[i].status.valid = false;
}
- amdgpu_ras_fini(adev);
-
return 0;
}
@@ -3478,6 +3478,24 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
dev_warn(adev->dev, "Failed to disallow df cstate");
+ for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
+ if (!adev->ip_blocks[i].status.valid)
+ continue;
+ if (!adev->ip_blocks[i].status.late_initialized)
+ continue;
+
+ if (adev->ip_blocks[i].version->funcs->early_fini) {
+ r = adev->ip_blocks[i].version->funcs->early_fini(&adev->ip_blocks[i]);
+ if (r) {
+ DRM_ERROR(" of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
+ return r;
+ }
+ }
+
+ adev->ip_blocks[i].status.late_initialized = false;
+ }
+
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
if (!adev->ip_blocks[i].status.valid)
continue;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index f622eb1551df..33a1a795c761 100755
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2175,6 +2175,9 @@ static int amdgpu_dm_early_fini(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
+ if (adev->in_s0ix || adev->in_s3 || adev->in_s4 || adev->in_suspend)
+ return 0;
+
amdgpu_dm_audio_fini(adev);
return 0;
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 11/13] drm/amdgpu/sdma: improve the way to manage irq reference count
2025-01-08 13:59 Jiang Liu
` (9 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 10/13] drm/admgpu: make device state machine work " Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 12/13] drm/amdgpu/nbio: " Jiang Liu
` (3 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Refactor sdma related code to improve the way to manage irq reference
count. Originally amdgpu_irq_get() is called from ip_blocks[].late_init
and amdgpu_irq_put is called from ip_blocks[].hw_fini. The asymmetric
design may cause issue under certain conditions. So
1) introduce amdgpu_sdma_ras_early_fini() to undo work done by
amdgpu_sdma_ras_late_init().
2) remove call of amdgpu_irq_put in xxxx_hw_fini().
3) call amdgpu_irq_get() in function sdma_v4_4_2_xcp_resume() to keep
irq reference count balanced. Currently sdma_v4_4_2_xcp_resume()
doesn't invoke ip_blocks[].late_init(amdgpu_irq_get), but
sdma_v4_4_2_xcp_suspend() invokes amdgpu_irq_put(), thus causes
unbalanced irq reference count. Fix it by calling amdgpu_irq_get()
in function sdma_v4_4_2_xcp_resume().
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 ++++++++++++++++++++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 ++
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 --------
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 ++++++++++++---------
5 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index fa19c5391d8c..ff5907f2c544 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -383,7 +383,7 @@ enum amdgpu_marker {
AMDGPU_MARKER_RAS_DEBUGFS = 63,
};
-#define AMDGPU_MARKER_INDEX_IRQ(idx) (AMDGPU_MARKER_INDEX_IRQ0 + (idx))
+#define AMDGPU_MARKER_IRQ(idx) (AMDGPU_MARKER_IRQ0 + (idx))
struct amdgpu_ip_block_status {
bool valid;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 21938e858d55..799bcd9978da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -110,16 +110,35 @@ int amdgpu_sdma_ras_late_init(struct amdgpu_device *adev,
AMDGPU_SDMA_IRQ_INSTANCE0 + i);
if (r)
goto late_fini;
+ amdgpu_ras_set_marker(adev, ras_block, AMDGPU_MARKER_IRQ(i));
}
}
return 0;
late_fini:
- amdgpu_ras_block_early_fini(adev, ras_block);
+ amdgpu_sdma_ras_early_fini(adev, ras_block);
return r;
}
+void amdgpu_sdma_ras_early_fini(struct amdgpu_device *adev,
+ struct ras_common_if *ras_block)
+{
+ int i;
+
+ if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
+ for (i = 0; i < adev->sdma.num_instances; i++) {
+ if (amdgpu_ras_test_and_clear_marker(adev, ras_block,
+ AMDGPU_MARKER_IRQ(i))) {
+ amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
+ AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+ }
+ }
+ }
+
+ amdgpu_ras_block_early_fini(adev, ras_block);
+}
+
int amdgpu_sdma_process_ras_data_cb(struct amdgpu_device *adev,
void *err_data,
struct amdgpu_iv_entry *entry)
@@ -334,8 +353,11 @@ int amdgpu_sdma_ras_sw_init(struct amdgpu_device *adev)
adev->sdma.ras_if = &ras->ras_block.ras_comm;
/* If not define special ras_late_init function, use default ras_late_init */
- if (!ras->ras_block.ras_late_init)
+ if (!ras->ras_block.ras_late_init) {
+ WARN_ON(ras->ras_block.ras_early_fini);
ras->ras_block.ras_late_init = amdgpu_sdma_ras_late_init;
+ ras->ras_block.ras_early_fini = amdgpu_sdma_ras_early_fini;
+ }
/* If not defined special ras_cb function, use default ras_cb */
if (!ras->ras_block.ras_cb)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
index 087ce0f6fa07..1915e6c9be63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
@@ -164,6 +164,8 @@ int amdgpu_sdma_get_index_from_ring(struct amdgpu_ring *ring, uint32_t *index);
uint64_t amdgpu_sdma_get_csa_mc_addr(struct amdgpu_ring *ring, unsigned vmid);
int amdgpu_sdma_ras_late_init(struct amdgpu_device *adev,
struct ras_common_if *ras_block);
+void amdgpu_sdma_ras_early_fini(struct amdgpu_device *adev,
+ struct ras_common_if *ras_block);
int amdgpu_sdma_process_ras_data_cb(struct amdgpu_device *adev,
void *err_data,
struct amdgpu_iv_entry *entry);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index ccf0d531776d..369d7094a3ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1968,18 +1968,10 @@ static int sdma_v4_0_hw_init(struct amdgpu_ip_block *ip_block)
static int sdma_v4_0_hw_fini(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
- int i;
if (amdgpu_sriov_vf(adev))
return 0;
- if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
- for (i = 0; i < adev->sdma.num_instances; i++) {
- amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
- AMDGPU_SDMA_IRQ_INSTANCE0 + i);
- }
- }
-
sdma_v4_0_ctx_switch_enable(adev, false);
sdma_v4_0_enable(adev, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 9c7cea0890c9..744569bbc1e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1486,19 +1486,11 @@ static int sdma_v4_4_2_hw_fini(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
uint32_t inst_mask;
- int i;
if (amdgpu_sriov_vf(adev))
return 0;
inst_mask = GENMASK(adev->sdma.num_instances - 1, 0);
- if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
- for (i = 0; i < adev->sdma.num_instances; i++) {
- amdgpu_irq_put(adev, &adev->sdma.ecc_irq,
- AMDGPU_SDMA_IRQ_INSTANCE0 + i);
- }
- }
-
sdma_v4_4_2_inst_ctx_switch_enable(adev, false, inst_mask);
sdma_v4_4_2_inst_enable(adev, false, inst_mask);
@@ -2153,14 +2145,24 @@ const struct amdgpu_ip_block_version sdma_v4_4_2_ip_block = {
static int sdma_v4_4_2_xcp_resume(void *handle, uint32_t inst_mask)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- int r;
+ uint32_t tmp_mask = inst_mask;
+ int r, i;
if (!amdgpu_sriov_vf(adev))
sdma_v4_4_2_inst_init_golden_registers(adev, inst_mask);
r = sdma_v4_4_2_inst_start(adev, inst_mask);
+ if (r)
+ return r;
- return r;
+ if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA)) {
+ for_each_inst(i, tmp_mask) {
+ amdgpu_irq_get(adev, &adev->sdma.ecc_irq,
+ AMDGPU_SDMA_IRQ_INSTANCE0 + i);
+ }
+ }
+
+ return 0;
}
static int sdma_v4_4_2_xcp_suspend(void *handle, uint32_t inst_mask)
@@ -2366,6 +2368,7 @@ static struct amdgpu_sdma_ras sdma_v4_4_2_ras = {
.ras_block = {
.hw_ops = &sdma_v4_4_2_ras_hw_ops,
.ras_late_init = sdma_v4_4_2_ras_late_init,
+ .ras_early_fini = amdgpu_sdma_ras_early_fini,
},
};
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 12/13] drm/amdgpu/nbio: improve the way to manage irq reference count
2025-01-08 13:59 Jiang Liu
` (10 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 11/13] drm/amdgpu/sdma: improve the way to manage irq reference count Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 13/13] drm/amdgpu/asic: make ip block operations symmetric by .early_fini() Jiang Liu
` (2 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Refactor nbio related code to improve the way to manage irq reference
count. Originally amdgpu_irq_get() is called from ip_blocks[].late_init
and amdgpu_irq_put is called from ip_blocks[].hw_fini. The asymmetric
design may cause issue under certain conditions. So
1) introduce amdgpu_nbio_ras_early_fini() to undo work done by
amdgpu_nbio_ras_late_init().
2) remove call of amdgpu_irq_put in xxxx_hw_fini().
3) record the status where reference count is held for specific irq.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +++++++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 +
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 +
drivers/gpu/drm/amd/amdgpu/soc15.c | 16 ----------------
5 files changed, 18 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
index c75ce91f94ab..b8a69ceec2e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c
@@ -64,13 +64,27 @@ int amdgpu_nbio_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *
r = amdgpu_irq_get(adev, &adev->nbio.ras_controller_irq, 0);
if (r)
goto late_fini;
+ amdgpu_ras_set_marker(adev, ras_block, AMDGPU_MARKER_IRQ0);
r = amdgpu_irq_get(adev, &adev->nbio.ras_err_event_athub_irq, 0);
if (r)
goto late_fini;
+ amdgpu_ras_set_marker(adev, ras_block, AMDGPU_MARKER_IRQ1);
}
return 0;
late_fini:
- amdgpu_ras_block_early_fini(adev, ras_block);
+ amdgpu_nbio_ras_early_fini(adev, ras_block);
return r;
}
+
+void amdgpu_nbio_ras_early_fini(struct amdgpu_device *adev, struct ras_common_if *ras_block)
+{
+ if (amdgpu_ras_is_supported(adev, adev->nbio.ras_if->block)) {
+ if (amdgpu_ras_test_and_clear_marker(adev, ras_block, AMDGPU_MARKER_IRQ0))
+ amdgpu_irq_put(adev, &adev->nbio.ras_controller_irq, 0);
+ if (amdgpu_ras_test_and_clear_marker(adev, ras_block, AMDGPU_MARKER_IRQ1))
+ amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0);
+ }
+
+ amdgpu_ras_block_early_fini(adev, ras_block);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
index 79c2f807b9fe..e1edf75602c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h
@@ -117,6 +117,7 @@ struct amdgpu_nbio {
int amdgpu_nbio_ras_sw_init(struct amdgpu_device *adev);
int amdgpu_nbio_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block);
+void amdgpu_nbio_ras_early_fini(struct amdgpu_device *adev, struct ras_common_if *ras_block);
u64 amdgpu_nbio_get_pcie_replay_count(struct amdgpu_device *adev);
#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 97782a73f4b0..6c727b77bb3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -665,6 +665,7 @@ struct amdgpu_nbio_ras nbio_v7_4_ras = {
},
.hw_ops = &nbio_v7_4_ras_hw_ops,
.ras_late_init = amdgpu_nbio_ras_late_init,
+ .ras_early_fini = amdgpu_nbio_ras_early_fini,
},
.handle_ras_controller_intr_no_bifring = nbio_v7_4_handle_ras_controller_intr_no_bifring,
.handle_ras_err_event_athub_intr_no_bifring = nbio_v7_4_handle_ras_err_event_athub_intr_no_bifring,
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
index 8a0a63ac88d2..684a38a16247 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
@@ -703,6 +703,7 @@ struct amdgpu_nbio_ras nbio_v7_9_ras = {
},
.hw_ops = &nbio_v7_9_ras_hw_ops,
.ras_late_init = amdgpu_nbio_ras_late_init,
+ .ras_early_fini = amdgpu_nbio_ras_early_fini,
},
.handle_ras_controller_intr_no_bifring = nbio_v7_9_handle_ras_controller_intr_no_bifring,
.handle_ras_err_event_athub_intr_no_bifring = nbio_v7_9_handle_ras_err_event_athub_intr_no_bifring,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 6fcdeb265a22..1dca7d7c813c 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1299,22 +1299,6 @@ static int soc15_common_hw_fini(struct amdgpu_ip_block *ip_block)
if (amdgpu_sriov_vf(adev))
xgpu_ai_mailbox_put_irq(adev);
- /*
- * For minimal init, late_init is not called, hence RAS irqs are not
- * enabled.
- */
- if ((!amdgpu_sriov_vf(adev)) &&
- (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
- adev->nbio.ras_if &&
- amdgpu_ras_is_supported(adev, adev->nbio.ras_if->block)) {
- if (adev->nbio.ras &&
- adev->nbio.ras->init_ras_controller_interrupt)
- amdgpu_irq_put(adev, &adev->nbio.ras_controller_irq, 0);
- if (adev->nbio.ras &&
- adev->nbio.ras->init_ras_err_event_athub_interrupt)
- amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0);
- }
-
return 0;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH 13/13] drm/amdgpu/asic: make ip block operations symmetric by .early_fini()
2025-01-08 13:59 Jiang Liu
` (11 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 12/13] drm/amdgpu/nbio: " Jiang Liu
@ 2025-01-08 14:00 ` Jiang Liu
2025-01-08 14:10 ` Christian König
2025-01-08 16:33 ` Re: Mario Limonciello
14 siblings, 0 replies; 27+ messages in thread
From: Jiang Liu @ 2025-01-08 14:00 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Cc: Jiang Liu
Make ip block operations for asic symmetric by making using of the
.early_fini() hook, which will undo work done by the .late_init() hook.
1) introduce xxx_common_early_fini() for nv/soc15/soc21/soc24.
2) move `enable_doorbell_selfring_aperture(adev, false)` from .hw_init()
into .early_fini().
3) call xgpu_nv_mailbox_put_irq() for nv.c to avoid possible resource
leakage.
4) use flags to track irq reference count usage.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
---
drivers/gpu/drm/amd/amdgpu/nv.c | 14 +++++++++++-
drivers/gpu/drm/amd/amdgpu/soc15.c | 22 +++++++++++--------
drivers/gpu/drm/amd/amdgpu/soc21.c | 35 ++++++++++++++++++++----------
drivers/gpu/drm/amd/amdgpu/soc24.c | 17 +++++++++++----
4 files changed, 63 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 4e8f9af1e2be..f87b9c835774 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -973,6 +973,18 @@ static int nv_common_late_init(struct amdgpu_ip_block *ip_block)
return 0;
}
+static int nv_common_early_fini(struct amdgpu_ip_block *ip_block)
+{
+ struct amdgpu_device *adev = ip_block->adev;
+
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
+
+ if (amdgpu_sriov_vf(adev))
+ xgpu_nv_mailbox_put_irq(adev);
+
+ return 0;
+}
+
static int nv_common_sw_init(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
@@ -1024,7 +1036,6 @@ static int nv_common_hw_fini(struct amdgpu_ip_block *ip_block)
* selfring doorbell.
*/
adev->nbio.funcs->enable_doorbell_aperture(adev, false);
- adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
return 0;
}
@@ -1110,6 +1121,7 @@ static const struct amd_ip_funcs nv_common_ip_funcs = {
.name = "nv_common",
.early_init = nv_common_early_init,
.late_init = nv_common_late_init,
+ .early_fini = nv_common_early_fini,
.sw_init = nv_common_sw_init,
.sw_fini = nv_common_sw_fini,
.hw_init = nv_common_hw_init,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 1dca7d7c813c..e084430f8998 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1217,6 +1217,18 @@ static int soc15_common_late_init(struct amdgpu_ip_block *ip_block)
return 0;
}
+static int soc15_common_early_fini(struct amdgpu_ip_block *ip_block)
+{
+ struct amdgpu_device *adev = ip_block->adev;
+
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
+
+ if (amdgpu_sriov_vf(adev))
+ xgpu_ai_mailbox_put_irq(adev);
+
+ return 0;
+}
+
static int soc15_common_sw_init(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
@@ -1288,16 +1300,7 @@ static int soc15_common_hw_fini(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
- /* Disable the doorbell aperture and selfring doorbell aperture
- * separately in hw_fini because soc15_enable_doorbell_aperture
- * has been removed and there is no need to delay disabling
- * selfring doorbell.
- */
adev->nbio.funcs->enable_doorbell_aperture(adev, false);
- adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
-
- if (amdgpu_sriov_vf(adev))
- xgpu_ai_mailbox_put_irq(adev);
return 0;
}
@@ -1476,6 +1479,7 @@ static const struct amd_ip_funcs soc15_common_ip_funcs = {
.name = "soc15_common",
.early_init = soc15_common_early_init,
.late_init = soc15_common_late_init,
+ .early_fini = soc15_common_early_fini,
.sw_init = soc15_common_sw_init,
.sw_fini = soc15_common_sw_fini,
.hw_init = soc15_common_hw_init,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 03b9bcb8eb6d..5a6da83c938a 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -796,6 +796,7 @@ static int soc21_common_early_init(struct amdgpu_ip_block *ip_block)
static int soc21_common_late_init(struct amdgpu_ip_block *ip_block)
{
+ int r;
struct amdgpu_device *adev = ip_block->adev;
if (amdgpu_sriov_vf(adev)) {
@@ -816,12 +817,16 @@ static int soc21_common_late_init(struct amdgpu_ip_block *ip_block)
}
} else {
if (adev->nbio.ras &&
- adev->nbio.ras_err_event_athub_irq.funcs)
+ adev->nbio.ras_err_event_athub_irq.funcs) {
/* don't need to fail gpu late init
* if enabling athub_err_event interrupt failed
* nbio v4_3 only support fatal error hanlding
* just enable the interrupt directly */
- amdgpu_irq_get(adev, &adev->nbio.ras_err_event_athub_irq, 0);
+ r = amdgpu_irq_get(adev, &adev->nbio.ras_err_event_athub_irq, 0);
+ if (r)
+ return r;
+ amdgpu_ip_block_set_marker(ip_block, AMDGPU_MARKER_IRQ0);
+ }
}
/* Enable selfring doorbell aperture late because doorbell BAR
@@ -832,6 +837,22 @@ static int soc21_common_late_init(struct amdgpu_ip_block *ip_block)
return 0;
}
+static int soc21_common_early_fini(struct amdgpu_ip_block *ip_block)
+{
+ struct amdgpu_device *adev = ip_block->adev;
+
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
+
+ if (amdgpu_sriov_vf(adev)) {
+ xgpu_nv_mailbox_put_irq(adev);
+ } else {
+ if (amdgpu_ip_block_test_and_clear_marker(ip_block, AMDGPU_MARKER_IRQ0))
+ amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0);
+ }
+
+ return 0;
+}
+
static int soc21_common_sw_init(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
@@ -877,15 +898,6 @@ static int soc21_common_hw_fini(struct amdgpu_ip_block *ip_block)
* selfring doorbell.
*/
adev->nbio.funcs->enable_doorbell_aperture(adev, false);
- adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
-
- if (amdgpu_sriov_vf(adev)) {
- xgpu_nv_mailbox_put_irq(adev);
- } else {
- if (adev->nbio.ras &&
- adev->nbio.ras_err_event_athub_irq.funcs)
- amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0);
- }
return 0;
}
@@ -1000,6 +1012,7 @@ static const struct amd_ip_funcs soc21_common_ip_funcs = {
.name = "soc21_common",
.early_init = soc21_common_early_init,
.late_init = soc21_common_late_init,
+ .early_fini = soc21_common_early_fini,
.sw_init = soc21_common_sw_init,
.sw_fini = soc21_common_sw_fini,
.hw_init = soc21_common_hw_init,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
index b20dc81dc257..5a4a04834ecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc24.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
@@ -455,6 +455,18 @@ static int soc24_common_late_init(struct amdgpu_ip_block *ip_block)
return 0;
}
+static int soc24_common_early_fini(struct amdgpu_ip_block *ip_block)
+{
+ struct amdgpu_device *adev = ip_block->adev;
+
+ adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
+
+ if (amdgpu_sriov_vf(adev))
+ xgpu_nv_mailbox_put_irq(adev);
+
+ return 0;
+}
+
static int soc24_common_sw_init(struct amdgpu_ip_block *ip_block)
{
struct amdgpu_device *adev = ip_block->adev;
@@ -504,10 +516,6 @@ static int soc24_common_hw_fini(struct amdgpu_ip_block *ip_block)
* selfring doorbell.
*/
adev->nbio.funcs->enable_doorbell_aperture(adev, false);
- adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false);
-
- if (amdgpu_sriov_vf(adev))
- xgpu_nv_mailbox_put_irq(adev);
return 0;
}
@@ -590,6 +598,7 @@ static const struct amd_ip_funcs soc24_common_ip_funcs = {
.name = "soc24_common",
.early_init = soc24_common_early_init,
.late_init = soc24_common_late_init,
+ .early_fini = soc24_common_early_fini,
.sw_init = soc24_common_sw_init,
.sw_fini = soc24_common_sw_fini,
.hw_init = soc24_common_hw_init,
--
2.43.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re:
2025-01-08 13:59 Jiang Liu
` (12 preceding siblings ...)
2025-01-08 14:00 ` [RFC PATCH 13/13] drm/amdgpu/asic: make ip block operations symmetric by .early_fini() Jiang Liu
@ 2025-01-08 14:10 ` Christian König
2025-01-08 16:33 ` Re: Mario Limonciello
14 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2025-01-08 14:10 UTC (permalink / raw)
To: Jiang Liu, alexander.deucher, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Am 08.01.25 um 14:59 schrieb Jiang Liu:
> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
>
> Recently we were testing suspend/resume functionality with AMD GPUs,
> we have encountered several resource tracking related bugs, such as
> double buffer free, use after free and unbalanced irq reference count.
>
> We have tried to solve these issues case by case, but found that may
> not be the right way. Especially about the unbalanced irq reference
> count, there will be new issues appear once we fixed the current known
> issues. After analyzing related source code, we found that there may be
> some fundamental implementaion flaws behind these resource tracking
> issues.
In general please run your patches through checkpatch.pl. There are
quite a number of style issues with those code changes.
>
> The amdgpu driver has two major state machines to driver the device
> management flow, one is for ip blocks, the other is for ras blocks.
> The hook points defined in struct amd_ip_funcs for device setup/teardown
> are symmetric, but the implementation is asymmetric, sometime even
> ambiguous. The most obvious two issues we noticed are:
> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
> are called from .hw_fini() instead of .early_fini().
Yes and if I remember correctly that is absolutely intentional.
IRQs can't be enabled unless all IP blocks are up and running because
otherwise the IRQ handler sometimes doesn't have the necessary
functionality at hand.
But for HW fini we only disable IRQs before we actually tear down the HW
state because we need them for operation feedback. E.g. for example ring
buffer completion interrupts for tear down commands.
Regards,
Christian.
> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
> match the way to set those flags.
>
> When taking device suspend/resume into account, in addition to device
> probe/remove, things get much more complex. Some issues arise because
> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
> .late_init hook points.
>
> So we try to fix those issues by two enhancements/refinements to current
> device management state machines.
>
> The first change is to make the ip block state machine and associated
> status flags work in stack-like way as below:
> Callback Status Flags
> early_init: valid = true
> sw_init: sw = true
> hw_init: hw = true
> late_init: late_initialized = true
> early_fini: late_initialized = false
> hw_fini: hw = false
> sw_fini: sw = false
> late_fini: valid = false
>
> Also do the same thing for ras block state machine, though it's much
> more simpler.
>
> The second change is fine tune the overall device management work
> flow as below:
> 1. amdgpu_driver_load_kms()
> amdgpu_device_init()
> amdgpu_device_ip_early_init()
> ip_blocks[i].early_init()
> ip_blocks[i].status.valid = true
> amdgpu_device_ip_init()
> amdgpu_ras_init()
> ip_blocks[i].sw_init()
> ip_blocks[i].status.sw = true
> ip_blocks[i].hw_init()
> ip_blocks[i].status.hw = true
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
>
> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
> amdgpu_device_suspend()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> amdgpu_ras_feature_disable()
> amdgpu_ras_suspend()
> amdgpu_ras_disable_all_features()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].suspend()
>
> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
> amdgpu_device_resume()
> amdgpu_device_ip_resume()
> ip_blocks[i].resume()
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
> amdgpu_ras_resume()
> amdgpu_ras_enable_all_features()
>
> 4. amdgpu_driver_unload_kms()
> amdgpu_device_fini_hw()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].hw_fini()
> ip_blocks[i].status.hw = false
>
> 5. amdgpu_driver_release_kms()
> amdgpu_device_fini_sw()
> amdgpu_device_ip_fini()
> ip_blocks[i].sw_fini()
> ip_blocks[i].status.sw = false
> --- ip_blocks[i].status.valid = false
> +++ amdgpu_ras_fini()
> ip_blocks[i].late_fini()
> +++ ip_blocks[i].status.valid = false
> --- ip_blocks[i].status.late_initialized = false
> --- amdgpu_ras_fini()
>
> The main changes include:
> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
> Currently there's only one ip block which provides `early_fini`
> callback. We have add a check of `in_s3` to keep current behavior in
> function amdgpu_dm_early_fini(). So there should be no functional
> changes.
> 2) set ip_blocks[i].status.late_initialized to false after calling
> callback `early_fini`. We have auditted all usages of the
> late_initialized flag and no functional changes found.
> 3) only set ip_blocks[i].status.valid = false after calling the
> `late_fini` callback.
> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
>
> Then we try to refine each subsystem, such as nbio, asic, gfx, gmc,
> ras etc, to follow the new design. Currently we have only taken the
> nbio and asic as examples to show the proposed changes. Once we have
> confirmed that's the right way to go, we will handle the lefting
> subsystems.
>
> This is in early stage and requesting for comments, any comments and
> suggestions are welcomed!
> Jiang Liu (13):
> amdgpu: wrong array index to get ip block for PSP
> drm/admgpu: add helper functions to track status for ras manager
> drm/amdgpu: add a flag to track ras debugfs creation status
> drm/amdgpu: free all resources on error recovery path of
> amdgpu_ras_init()
> drm/amdgpu: introduce a flag to track refcount held for features
> drm/amdgpu: enhance amdgpu_ras_block_late_fini()
> drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
> drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
> drm/amdgpu: make IP block state machine works in stack like way
> drm/admgpu: make device state machine work in stack like way
> drm/amdgpu/sdma: improve the way to manage irq reference count
> drm/amdgpu/nbio: improve the way to manage irq reference count
> drm/amdgpu/asic: make ip block operations symmetric by .early_fini()
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++-----
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
> drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 +
> drivers/gpu/drm/amd/amdgpu/nv.c | 14 +-
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 -
> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +--
> drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++---
> drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++--
> drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++-
> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +
> 25 files changed, 326 insertions(+), 118 deletions(-)
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
@ 2025-01-08 15:02 ` Alex Deucher
0 siblings, 0 replies; 27+ messages in thread
From: Alex Deucher @ 2025-01-08 15:02 UTC (permalink / raw)
To: Jiang Liu
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello,
Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
Applied this one. Thanks!
Alex
On Wed, Jan 8, 2025 at 9:00 AM Jiang Liu <gerry@linux.alibaba.com> wrote:
>
> The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx,
> instead we should call amdgpu_device_ip_get_ip_block() to get the
> corresponding IP block oject.
>
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 0b1e280ee228..25c06c6c8a2a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -3830,10 +3830,12 @@ static ssize_t psp_usbc_pd_fw_sysfs_read(struct device *dev,
> {
> struct drm_device *ddev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(ddev);
> + struct amdgpu_ip_block *ip_block;
> uint32_t fw_ver;
> int ret;
>
> - if (!adev->ip_blocks[AMD_IP_BLOCK_TYPE_PSP].status.late_initialized) {
> + ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_PSP);
> + if (ip_block == NULL || !ip_block->status.late_initialized) {
> dev_info(adev->dev, "PSP block is not ready yet\n.");
> return -EBUSY;
> }
> @@ -3862,8 +3864,10 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev,
> struct amdgpu_bo *fw_buf_bo = NULL;
> uint64_t fw_pri_mc_addr;
> void *fw_pri_cpu_addr;
> + struct amdgpu_ip_block *ip_block;
>
> - if (!adev->ip_blocks[AMD_IP_BLOCK_TYPE_PSP].status.late_initialized) {
> + ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_PSP);
> + if (ip_block == NULL || !ip_block->status.late_initialized) {
> dev_err(adev->dev, "PSP block is not ready yet.");
> return -EBUSY;
> }
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re:
2025-01-08 13:59 Jiang Liu
` (13 preceding siblings ...)
2025-01-08 14:10 ` Christian König
@ 2025-01-08 16:33 ` Mario Limonciello
2025-01-09 5:34 ` Re: Gerry Liu
14 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-01-08 16:33 UTC (permalink / raw)
To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2,
xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
On 1/8/2025 07:59, Jiang Liu wrote:
> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
I'm not sure how this happened, but your subject didn't end up in the
subject of the thread on patch 0 so the thread just looks like an
unsubjected thread.
>
> Recently we were testing suspend/resume functionality with AMD GPUs,
> we have encountered several resource tracking related bugs, such as
> double buffer free, use after free and unbalanced irq reference count.
Can you share more aobut how you were hitting these issues? Are they
specific to S3 or to s2idle flows? dGPU or APU?
Are they only with SRIOV?
Is there anything to do with the host influencing the failures to
happen, or are you contriving the failures to find the bugs?
I know we've had some reports about resource tracking warnings on the
reset flows, but I haven't heard much about suspend/resume.
>
> We have tried to solve these issues case by case, but found that may
> not be the right way. Especially about the unbalanced irq reference
> count, there will be new issues appear once we fixed the current known
> issues. After analyzing related source code, we found that there may be
> some fundamental implementaion flaws behind these resource tracking
implementation
> issues.
>
> The amdgpu driver has two major state machines to driver the device
> management flow, one is for ip blocks, the other is for ras blocks.
> The hook points defined in struct amd_ip_funcs for device setup/teardown
> are symmetric, but the implementation is asymmetric, sometime even
> ambiguous. The most obvious two issues we noticed are:
> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
> are called from .hw_fini() instead of .early_fini().
> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
> match the way to set those flags.
>
> When taking device suspend/resume into account, in addition to device
> probe/remove, things get much more complex. Some issues arise because
> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
> .late_init hook points.
>
> So we try to fix those issues by two enhancements/refinements to current
> device management state machines.
>
> The first change is to make the ip block state machine and associated
> status flags work in stack-like way as below:
> Callback Status Flags
> early_init: valid = true
> sw_init: sw = true
> hw_init: hw = true
> late_init: late_initialized = true
> early_fini: late_initialized = false
> hw_fini: hw = false
> sw_fini: sw = false
> late_fini: valid = false
At a high level this makes sense to me, but I'd just call 'late' or
'late_init'.
Another idea if you make it stack like is to do it as a true enum for
the state machine and store it all in one variable.
>
> Also do the same thing for ras block state machine, though it's much
> more simpler.
>
> The second change is fine tune the overall device management work
> flow as below:
> 1. amdgpu_driver_load_kms()
> amdgpu_device_init()
> amdgpu_device_ip_early_init()
> ip_blocks[i].early_init()
> ip_blocks[i].status.valid = true
> amdgpu_device_ip_init()
> amdgpu_ras_init()
> ip_blocks[i].sw_init()
> ip_blocks[i].status.sw = true
> ip_blocks[i].hw_init()
> ip_blocks[i].status.hw = true
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
>
> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
> amdgpu_device_suspend()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> amdgpu_ras_feature_disable()
> amdgpu_ras_suspend()
> amdgpu_ras_disable_all_features()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].suspend()
>
> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
> amdgpu_device_resume()
> amdgpu_device_ip_resume()
> ip_blocks[i].resume()
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
> amdgpu_ras_resume()
> amdgpu_ras_enable_all_features()
>
> 4. amdgpu_driver_unload_kms()
> amdgpu_device_fini_hw()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].hw_fini()
> ip_blocks[i].status.hw = false
>
> 5. amdgpu_driver_release_kms()
> amdgpu_device_fini_sw()
> amdgpu_device_ip_fini()
> ip_blocks[i].sw_fini()
> ip_blocks[i].status.sw = false
> --- ip_blocks[i].status.valid = false
> +++ amdgpu_ras_fini()
> ip_blocks[i].late_fini()
> +++ ip_blocks[i].status.valid = false
> --- ip_blocks[i].status.late_initialized = false
> --- amdgpu_ras_fini()
>
> The main changes include:
> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
> Currently there's only one ip block which provides `early_fini`
> callback. We have add a check of `in_s3` to keep current behavior in
> function amdgpu_dm_early_fini(). So there should be no functional
> changes.
> 2) set ip_blocks[i].status.late_initialized to false after calling
> callback `early_fini`. We have auditted all usages of the
> late_initialized flag and no functional changes found.
> 3) only set ip_blocks[i].status.valid = false after calling the
> `late_fini` callback.
> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
>
> Then we try to refine each subsystem, such as nbio, asic, gfx, gmc,
> ras etc, to follow the new design. Currently we have only taken the
> nbio and asic as examples to show the proposed changes. Once we have
> confirmed that's the right way to go, we will handle the lefting
> subsystems.
>
> This is in early stage and requesting for comments, any comments and
> suggestions are welcomed!
> Jiang Liu (13):
> amdgpu: wrong array index to get ip block for PSP
> drm/admgpu: add helper functions to track status for ras manager
> drm/amdgpu: add a flag to track ras debugfs creation status
> drm/amdgpu: free all resources on error recovery path of
> amdgpu_ras_init()
> drm/amdgpu: introduce a flag to track refcount held for features
> drm/amdgpu: enhance amdgpu_ras_block_late_fini()
> drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
> drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
> drm/amdgpu: make IP block state machine works in stack like way
> drm/admgpu: make device state machine work in stack like way
> drm/amdgpu/sdma: improve the way to manage irq reference count
> drm/amdgpu/nbio: improve the way to manage irq reference count
> drm/amdgpu/asic: make ip block operations symmetric by .early_fini()
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++-----
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
> drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 +
> drivers/gpu/drm/amd/amdgpu/nv.c | 14 +-
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 -
> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +--
> drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++---
> drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++--
> drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++-
> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +
> 25 files changed, 326 insertions(+), 118 deletions(-)
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way
2025-01-08 14:00 ` [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way Jiang Liu
@ 2025-01-08 17:04 ` Mario Limonciello
0 siblings, 0 replies; 27+ messages in thread
From: Mario Limonciello @ 2025-01-08 17:04 UTC (permalink / raw)
To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2,
xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
On 1/8/2025 08:00, Jiang Liu wrote:
> There are some mismatch between IP block state machine and its associated
> status flags, especially about the meaning of `status.late_initialized`.
> So let's make the state machine and associated status flas work in stack-like
s/flas/flag/
> way as below:
> Callback Status
> early_init: valid = true
> sw_init: sw = true
> hw_init: hw = true
> late_init: late_initialized = true
> early_fini: late_initialized = false
> hw_fini: hw = false
> sw_fini: sw = false
> late_fini: valid = false
Would you mind also putting this into kerneldoc? I think it would be
really helpful for others to be able to follow in the future why there
are so many variables and what they all mean.
>
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0e69c7c7fe1f..36a33a391411 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3332,6 +3332,8 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
> DRM_DEBUG("early_fini of IP block <%s> failed %d\n",
> adev->ip_blocks[i].version->funcs->name, r);
> }
> +
> + adev->ip_blocks[i].status.late_initialized = false;
> }
>
> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> @@ -3407,15 +3409,14 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> adev->ip_blocks[i].version->funcs->name, r);
> }
> adev->ip_blocks[i].status.sw = false;
> - adev->ip_blocks[i].status.valid = false;
> }
>
> for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> - if (!adev->ip_blocks[i].status.late_initialized)
> + if (!adev->ip_blocks[i].status.valid)
> continue;
> if (adev->ip_blocks[i].version->funcs->late_fini)
> adev->ip_blocks[i].version->funcs->late_fini(&adev->ip_blocks[i]);
> - adev->ip_blocks[i].status.late_initialized = false;
> + adev->ip_blocks[i].status.valid = false;
> }
>
> amdgpu_ras_fini(adev);
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 10/13] drm/admgpu: make device state machine work in stack like way
2025-01-08 14:00 ` [RFC PATCH 10/13] drm/admgpu: make device state machine work " Jiang Liu
@ 2025-01-08 17:13 ` Mario Limonciello
0 siblings, 0 replies; 27+ messages in thread
From: Mario Limonciello @ 2025-01-08 17:13 UTC (permalink / raw)
To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2,
xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
On 1/8/2025 08:00, Jiang Liu wrote:
> Make the device state machine work in stack like way to better support
> suspend/resume by following changes:
>
> 1. amdgpu_driver_load_kms()
> amdgpu_device_init()
> amdgpu_device_ip_early_init()
> ip_blocks[i].early_init()
> ip_blocks[i].status.valid = true
> amdgpu_device_ip_init()
> amdgpu_ras_init()
> ip_blocks[i].sw_init()
> ip_blocks[i].status.sw = true
> ip_blocks[i].hw_init()
> ip_blocks[i].status.hw = true
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
>
> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
> amdgpu_device_suspend()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> amdgpu_ras_feature_disable()
> amdgpu_ras_suspend()
> amdgpu_ras_disable_all_features()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].suspend()
>
> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
> amdgpu_device_resume()
> amdgpu_device_ip_resume()
> ip_blocks[i].resume()
> amdgpu_device_ip_late_init()
> ip_blocks[i].late_init()
> ip_blocks[i].status.late_initialized = true
> amdgpu_ras_late_init()
> ras_blocks[i].ras_late_init()
> amdgpu_ras_feature_enable_on_boot()
> amdgpu_ras_resume()
> amdgpu_ras_enable_all_features()
>
> 4. amdgpu_driver_unload_kms()
> amdgpu_device_fini_hw()
> amdgpu_ras_early_fini()
> ras_blocks[i].ras_early_fini()
> +++ ip_blocks[i].early_fini()
> +++ ip_blocks[i].status.late_initialized = false
> ip_blocks[i].hw_fini()
> ip_blocks[i].status.hw = false
>
> 5. amdgpu_driver_release_kms()
> amdgpu_device_fini_sw()
> amdgpu_device_ip_fini()
> ip_blocks[i].sw_fini()
> ip_blocks[i].status.sw = false
> --- ip_blocks[i].status.valid = false
> +++ amdgpu_ras_fini()
> ip_blocks[i].late_fini()
> +++ ip_blocks[i].status.valid = false
> --- ip_blocks[i].status.late_initialized = false
> --- amdgpu_ras_fini()
>
> The main changes include:
> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
> Currently there's only one ip block which provides `early_fini`
> callback. We have add a check of `in_s3` to keep current behavior in
> function amdgpu_dm_early_fini(). So there should be no functional
> changes.
FWIW You added more than just the in_s3 (which is correct, so update
commit message!).
> 2) set ip_blocks[i].status.late_initialized to false after calling
> callback `early_fini`. We have auditted all usages of the
> late_initialized flag and no functional changes found.
> 3) only set ip_blocks[i].status.valid = false after calling the
> `late_fini` callback.
> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
>
> There's one more task left to analyze GPU reset related state machine
> transitions.
>
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 +++++++++++++++++--
> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
> 2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 36a33a391411..5c6b39e5cdaa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3411,6 +3411,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> adev->ip_blocks[i].status.sw = false;
> }
>
> + amdgpu_ras_fini(adev);
> +
> for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> if (!adev->ip_blocks[i].status.valid)
> continue;
> @@ -3419,8 +3421,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> adev->ip_blocks[i].status.valid = false;
> }
>
> - amdgpu_ras_fini(adev);
> -
> return 0;
> }
>
> @@ -3478,6 +3478,24 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
> if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> dev_warn(adev->dev, "Failed to disallow df cstate");
>
> + for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
This is the 37th time we have a for loop that walks the IP blocks.
I'm thinking it would be good to have for_each_ip_block macro, what do
you think?
> + if (!adev->ip_blocks[i].status.valid)
> + continue;
> + if (!adev->ip_blocks[i].status.late_initialized)
> + continue;
If you take my idea in the cover letter of moving the state machine into
a single variable I think that some of these cases can be a little bit
cleaner. IE if it was never valid it wouldn't have progressed to 'hw'
or 'sw' states.
This check (and other similar ones) could turn into something like this:
if (adev->ip_blocks[i].status != AMDGPU_STATE_LATE_INIT)
continue;
> +
> + if (adev->ip_blocks[i].version->funcs->early_fini) {
> + r = adev->ip_blocks[i].version->funcs->early_fini(&adev->ip_blocks[i]);
> + if (r) {
> + DRM_ERROR(" of IP block <%s> failed %d\n",
> + adev->ip_blocks[i].version->funcs->name, r);
> + return r;
> + }
> + }
> +
> + adev->ip_blocks[i].status.late_initialized = false;
> + }
> +
> for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
> if (!adev->ip_blocks[i].status.valid)
> continue;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index f622eb1551df..33a1a795c761 100755
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2175,6 +2175,9 @@ static int amdgpu_dm_early_fini(struct amdgpu_ip_block *ip_block)
> {
> struct amdgpu_device *adev = ip_block->adev;
>
> + if (adev->in_s0ix || adev->in_s3 || adev->in_s4 || adev->in_suspend)
> + return 0;
> +
I think this set of changes to display code (amdgpu_dm) should split to
it's own patch and stand on it's own.
> amdgpu_dm_audio_fini(adev);
>
> return 0;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status
2025-01-08 13:59 ` [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status Jiang Liu
@ 2025-01-08 17:19 ` Mario Limonciello
2025-01-10 3:19 ` Gerry Liu
0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-01-08 17:19 UTC (permalink / raw)
To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan,
airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2,
xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
On 1/8/2025 07:59, Jiang Liu wrote:
> Add a flag to track ras debugfs creation status, to avoid possible
> incorrect reference count management for ras block object in function
> amdgpu_ras_aca_is_supported().
Rather than taking a marker position, why not just check for
obj->fs_data.debugfs_name to be non NULL in amdgpu_ras_fs_fini()?
>
> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 32941f29507c..2ef7d3102be3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -378,7 +378,7 @@ enum amdgpu_marker {
> AMDGPU_MARKER_IRQ6 = 6,
> AMDGPU_MARKER_IRQ7 = 7,
> AMDGPU_MARKER_IRQ_MAX = 47,
> - AMDGPU_MARKER_DEBUGFS = 63,
> + AMDGPU_MARKER_RAS_DEBUGFS = 63,
Any particular reason you jumped to 63 in this patch and then counted
down in the next one? IE why not throw it at 48 (and then 49 for next one)?
> };
>
> #define AMDGPU_MARKER_INDEX_IRQ(idx) (AMDGPU_MARKER_INDEX_IRQ0 + (idx))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 6d52e22691f7..efd72b07a185 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1996,7 +1996,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
> {
> struct ras_manager *obj = amdgpu_ras_find_obj(adev, &head->head);
>
> - if (!obj || !dir)
> + if (!obj || !dir ||
> + amdgpu_ras_test_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS))
> return;
>
> get_obj(obj);
> @@ -2007,6 +2008,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
>
> debugfs_create_file(obj->fs_data.debugfs_name, S_IWUGO | S_IRUGO, dir,
> obj, &amdgpu_ras_debugfs_ops);
> +
> + amdgpu_ras_set_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS);
> }
>
> static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev)
> @@ -2134,7 +2137,9 @@ static int amdgpu_ras_fs_fini(struct amdgpu_device *adev)
> if (IS_ENABLED(CONFIG_DEBUG_FS)) {
> list_for_each_entry_safe(con_obj, tmp, &con->head, node) {
> ip_obj = amdgpu_ras_find_obj(adev, &con_obj->head);
> - if (ip_obj)
> + if (ip_obj &&
> + amdgpu_ras_test_and_clear_marker(adev, &ip_obj->head,
> + AMDGPU_MARKER_RAS_DEBUGFS))
> put_obj(ip_obj);
> }
> }
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re:
2025-01-08 16:33 ` Re: Mario Limonciello
@ 2025-01-09 5:34 ` Gerry Liu
2025-01-09 17:10 ` Re: Mario Limonciello
0 siblings, 1 reply; 27+ messages in thread
From: Gerry Liu @ 2025-01-09 5:34 UTC (permalink / raw)
To: Mario Limonciello
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang,
Kent.Russell, Shuo Liu, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 10505 bytes --]
> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com> 写道:
>
> On 1/8/2025 07:59, Jiang Liu wrote:
>> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
>
> I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread.
Maybe it’s caused by one extra blank line at the header.
>
>> Recently we were testing suspend/resume functionality with AMD GPUs,
>> we have encountered several resource tracking related bugs, such as
>> double buffer free, use after free and unbalanced irq reference count.
>
> Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU?
> Are they only with SRIOV?
>
> Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs?
>
> I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume.
We are investigating to develop some advanced product features based on amdgpu suspend/resume.
So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script:
```
echo platform > /sys/power/pm_test
i=0
while true; do
echo mem > /sys/power/state
let i=i+1
echo $i
sleep 1
done
```
It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs.
With some investigation we found that the gpu asic should be reset during the test, so we submitted a patch to fix the failure (https://github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ROCK-Kernel-Driver/pull/181>)
During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms.
So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025-January/118484.html <https://lists.freedesktop.org/archives/amd-gfx/2025-January/118484.html>
With sriov in single VF mode, resume always fails. Seems some contexts/vram buffers get lost during suspend and haven’t be restored on resume, so cause failure.
We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:)
>
>> We have tried to solve these issues case by case, but found that may
>> not be the right way. Especially about the unbalanced irq reference
>> count, there will be new issues appear once we fixed the current known
>> issues. After analyzing related source code, we found that there may be
>> some fundamental implementaion flaws behind these resource tracking
>
> implementation
>
>> issues.
>> The amdgpu driver has two major state machines to driver the device
>> management flow, one is for ip blocks, the other is for ras blocks.
>> The hook points defined in struct amd_ip_funcs for device setup/teardown
>> are symmetric, but the implementation is asymmetric, sometime even
>> ambiguous. The most obvious two issues we noticed are:
>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
>> are called from .hw_fini() instead of .early_fini().
>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
>> match the way to set those flags.
>> When taking device suspend/resume into account, in addition to device
>> probe/remove, things get much more complex. Some issues arise because
>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
>> .late_init hook points.
>>
>> So we try to fix those issues by two enhancements/refinements to current
>> device management state machines.
>> The first change is to make the ip block state machine and associated
>> status flags work in stack-like way as below:
>> Callback Status Flags
>> early_init: valid = true
>> sw_init: sw = true
>> hw_init: hw = true
>> late_init: late_initialized = true
>> early_fini: late_initialized = false
>> hw_fini: hw = false
>> sw_fini: sw = false
>> late_fini: valid = false
>
> At a high level this makes sense to me, but I'd just call 'late' or 'late_init'.
>
> Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable.
I will add a patch to convert those bool flags into an enum.
Thanks,
Gerry
>
>> Also do the same thing for ras block state machine, though it's much
>> more simpler.
>> The second change is fine tune the overall device management work
>> flow as below:
>> 1. amdgpu_driver_load_kms()
>> amdgpu_device_init()
>> amdgpu_device_ip_early_init()
>> ip_blocks[i].early_init()
>> ip_blocks[i].status.valid = true
>> amdgpu_device_ip_init()
>> amdgpu_ras_init()
>> ip_blocks[i].sw_init()
>> ip_blocks[i].status.sw = true
>> ip_blocks[i].hw_init()
>> ip_blocks[i].status.hw = true
>> amdgpu_device_ip_late_init()
>> ip_blocks[i].late_init()
>> ip_blocks[i].status.late_initialized = true
>> amdgpu_ras_late_init()
>> ras_blocks[i].ras_late_init()
>> amdgpu_ras_feature_enable_on_boot()
>> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff()
>> amdgpu_device_suspend()
>> amdgpu_ras_early_fini()
>> ras_blocks[i].ras_early_fini()
>> amdgpu_ras_feature_disable()
>> amdgpu_ras_suspend()
>> amdgpu_ras_disable_all_features()
>> +++ ip_blocks[i].early_fini()
>> +++ ip_blocks[i].status.late_initialized = false
>> ip_blocks[i].suspend()
>> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore()
>> amdgpu_device_resume()
>> amdgpu_device_ip_resume()
>> ip_blocks[i].resume()
>> amdgpu_device_ip_late_init()
>> ip_blocks[i].late_init()
>> ip_blocks[i].status.late_initialized = true
>> amdgpu_ras_late_init()
>> ras_blocks[i].ras_late_init()
>> amdgpu_ras_feature_enable_on_boot()
>> amdgpu_ras_resume()
>> amdgpu_ras_enable_all_features()
>> 4. amdgpu_driver_unload_kms()
>> amdgpu_device_fini_hw()
>> amdgpu_ras_early_fini()
>> ras_blocks[i].ras_early_fini()
>> +++ ip_blocks[i].early_fini()
>> +++ ip_blocks[i].status.late_initialized = false
>> ip_blocks[i].hw_fini()
>> ip_blocks[i].status.hw = false
>> 5. amdgpu_driver_release_kms()
>> amdgpu_device_fini_sw()
>> amdgpu_device_ip_fini()
>> ip_blocks[i].sw_fini()
>> ip_blocks[i].status.sw = false
>> --- ip_blocks[i].status.valid = false
>> +++ amdgpu_ras_fini()
>> ip_blocks[i].late_fini()
>> +++ ip_blocks[i].status.valid = false
>> --- ip_blocks[i].status.late_initialized = false
>> --- amdgpu_ras_fini()
>> The main changes include:
>> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend().
>> Currently there's only one ip block which provides `early_fini`
>> callback. We have add a check of `in_s3` to keep current behavior in
>> function amdgpu_dm_early_fini(). So there should be no functional
>> changes.
>> 2) set ip_blocks[i].status.late_initialized to false after calling
>> callback `early_fini`. We have auditted all usages of the
>> late_initialized flag and no functional changes found.
>> 3) only set ip_blocks[i].status.valid = false after calling the
>> `late_fini` callback.
>> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
>> Then we try to refine each subsystem, such as nbio, asic, gfx, gmc,
>> ras etc, to follow the new design. Currently we have only taken the
>> nbio and asic as examples to show the proposed changes. Once we have
>> confirmed that's the right way to go, we will handle the lefting
>> subsystems.
>> This is in early stage and requesting for comments, any comments and
>> suggestions are welcomed!
>> Jiang Liu (13):
>> amdgpu: wrong array index to get ip block for PSP
>> drm/admgpu: add helper functions to track status for ras manager
>> drm/amdgpu: add a flag to track ras debugfs creation status
>> drm/amdgpu: free all resources on error recovery path of
>> amdgpu_ras_init()
>> drm/amdgpu: introduce a flag to track refcount held for features
>> drm/amdgpu: enhance amdgpu_ras_block_late_fini()
>> drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR
>> drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini()
>> drm/amdgpu: make IP block state machine works in stack like way
>> drm/admgpu: make device state machine work in stack like way
>> drm/amdgpu/sdma: improve the way to manage irq reference count
>> drm/amdgpu/nbio: improve the way to manage irq reference count
>> drm/amdgpu/asic: make ip block operations symmetric by .early_fini()
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++-----
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
>> drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 +
>> drivers/gpu/drm/amd/amdgpu/nv.c | 14 +-
>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 -
>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +--
>> drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++---
>> drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++--
>> drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++-
>> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +
>> 25 files changed, 326 insertions(+), 118 deletions(-)
[-- Attachment #2: Type: text/html, Size: 38668 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re:
2025-01-09 5:34 ` Re: Gerry Liu
@ 2025-01-09 17:10 ` Mario Limonciello
2025-01-13 1:19 ` Re: Gerry Liu
0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-01-09 17:10 UTC (permalink / raw)
To: Gerry Liu
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang,
Kent.Russell, Shuo Liu, amd-gfx
General note - don't use HTML for mailing list communication.
I'm not sure if Apple Mail lets you switch this around.
If not, you might try using Thunderbird instead. You can pick to reply
in plain text or HTML by holding shift when you hit "reply all"
For my reply I'll convert my reply to plain text, please see inline below.
On 1/8/2025 23:34, Gerry Liu wrote:
>
>
>> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com
>> <mailto:mario.limonciello@amd.com>> 写道:
>>
>> On 1/8/2025 07:59, Jiang Liu wrote:
>>> Subject: [RFC PATCH 00/13] Enhance device state machine to better
>>> support suspend/resume
>>
>> I'm not sure how this happened, but your subject didn't end up in the
>> subject of the thread on patch 0 so the thread just looks like an
>> unsubjected thread.
> Maybe it’s caused by one extra blank line at the header.
Yeah that might be it. Hopefully it doesn't happen on v2.
>
>>
>>> Recently we were testing suspend/resume functionality with AMD GPUs,
>>> we have encountered several resource tracking related bugs, such as
>>> double buffer free, use after free and unbalanced irq reference count.
>>
>> Can you share more aobut how you were hitting these issues? Are they
>> specific to S3 or to s2idle flows? dGPU or APU?
>> Are they only with SRIOV?
>>
>> Is there anything to do with the host influencing the failures to
>> happen, or are you contriving the failures to find the bugs?
>>
>> I know we've had some reports about resource tracking warnings on the
>> reset flows, but I haven't heard much about suspend/resume.
> We are investigating to develop some advanced product features based on
> amdgpu suspend/resume.
> So we started by tested the suspend/resume functionality of AMD 308x
> GPUs with the following simple script:
> ```
> echoplatform >/sys/power/pm_test
> i=0
> while true; do
> echomem >/sys/power/state
> leti=i+1
> echo$i
> sleep1
> done
> ```
>
> It succeeds with the first and second iteration but always fails on
> following iterations on a bare metal servers with eight MI308X GPUs.
Can you share more about this server? Does it support suspend to ram or
a hardware backed suspend to idle? If you don't know, you can check
like this:
❯ cat /sys/power/mem_sleep
s2idle [deep]
If it's suspend to idle, what does the FACP indicate? You can do this
check to find out if you don't know.
❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp
❯ sudo iasl -d /tmp/FACP
❯ grep "idle" -i /tmp/FACP.dsl
Low Power S0 Idle (V5) : 0
> With some investigation we found that the gpu asic should be reset
> during the test,
Yeah; but this comes back to my above questions. Typically there is an
assumption that the power rails are going to be cut in system suspend.
If that doesn't hold true, then you're doing a pure software suspend and
have found a series of issues in the driver with how that's handled.
> so we submitted a patch to fix the failure (https://
> github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/
> ROCK-Kernel-Driver/pull/181>)
Typically kernel patches don't go through that repo, they're discussed
on the mailing lists. Can you bring this patch for discussion on amd-gfx?
>
> During analyze and root-cause the failure, we have encountered several
> crashes, resource leakages and false alarms.
Yeah; I think you found some real issues.
> So I have worked out patch sets to solve issues we encountered. The
> other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025-
> January/118484.html <https://lists.freedesktop.org/archives/amd-
> gfx/2025-January/118484.html>
Thanks!
>
> With sriov in single VF mode, resume always fails. Seems some contexts/
> vram buffers get lost during suspend and haven’t be restored on resume,
> so cause failure.
> We haven’t tested sriov in multiple VFs mode yet. We need more help from
> AMD side to make SR work for SRIOV:)
>
>>
>>> We have tried to solve these issues case by case, but found that may
>>> not be the right way. Especially about the unbalanced irq reference
>>> count, there will be new issues appear once we fixed the current known
>>> issues. After analyzing related source code, we found that there may be
>>> some fundamental implementaion flaws behind these resource tracking
>>
>> implementation
>>
>>> issues.
>>> The amdgpu driver has two major state machines to driver the device
>>> management flow, one is for ip blocks, the other is for ras blocks.
>>> The hook points defined in struct amd_ip_funcs for device setup/teardown
>>> are symmetric, but the implementation is asymmetric, sometime even
>>> ambiguous. The most obvious two issues we noticed are:
>>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
>>> are called from .hw_fini() instead of .early_fini().
>>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
>>> match the way to set those flags.
>>> When taking device suspend/resume into account, in addition to device
>>> probe/remove, things get much more complex. Some issues arise because
>>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
>>> .late_init hook points.
>>>
>>> So we try to fix those issues by two enhancements/refinements to current
>>> device management state machines.
>>> The first change is to make the ip block state machine and associated
>>> status flags work in stack-like way as below:
>>> Callback Status Flags
>>> early_init: valid = true
>>> sw_init: sw = true
>>> hw_init: hw = true
>>> late_init: late_initialized = true
>>> early_fini: late_initialized = false
>>> hw_fini: hw = false
>>> sw_fini: sw = false
>>> late_fini: valid = false
>>
>> At a high level this makes sense to me, but I'd just call 'late' or
>> 'late_init'.
>>
>> Another idea if you make it stack like is to do it as a true enum for
>> the state machine and store it all in one variable.
> I will add a patch to convert those bool flags into an enum.
Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status
2025-01-08 17:19 ` Mario Limonciello
@ 2025-01-10 3:19 ` Gerry Liu
2025-01-10 16:58 ` Mario Limonciello
0 siblings, 1 reply; 27+ messages in thread
From: Gerry Liu @ 2025-01-10 3:19 UTC (permalink / raw)
To: Mario Limonciello
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2, xiaogang.chen,
Kent.Russell, shuox.liu, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 2986 bytes --]
> 2025年1月9日 01:19,Mario Limonciello <mario.limonciello@amd.com> 写道:
>
> On 1/8/2025 07:59, Jiang Liu wrote:
>> Add a flag to track ras debugfs creation status, to avoid possible
>> incorrect reference count management for ras block object in function
>> amdgpu_ras_aca_is_supported().
>
> Rather than taking a marker position, why not just check for
> obj->fs_data.debugfs_name to be non NULL in amdgpu_ras_fs_fini()?
I plan to use marker as a common status track mechanism, so used marker here:)
>
>> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
>> 2 files changed, 8 insertions(+), 3 deletions(-)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 32941f29507c..2ef7d3102be3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -378,7 +378,7 @@ enum amdgpu_marker {
>> AMDGPU_MARKER_IRQ6 = 6,
>> AMDGPU_MARKER_IRQ7 = 7,
>> AMDGPU_MARKER_IRQ_MAX = 47,
>> - AMDGPU_MARKER_DEBUGFS = 63,
>> + AMDGPU_MARKER_RAS_DEBUGFS = 63,
>
> Any particular reason you jumped to 63 in this patch and then counted down in the next one? IE why not throw it at 48 (and then 49 for next one)?
I’m not sure how much markers are needed for IRQ, so I split the space into two parts: one for irq and one for others.
>
>> };
>> #define AMDGPU_MARKER_INDEX_IRQ(idx) (AMDGPU_MARKER_INDEX_IRQ0 + (idx))
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index 6d52e22691f7..efd72b07a185 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -1996,7 +1996,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
>> {
>> struct ras_manager *obj = amdgpu_ras_find_obj(adev, &head->head);
>> - if (!obj || !dir)
>> + if (!obj || !dir ||
>> + amdgpu_ras_test_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS))
>> return;
>> get_obj(obj);
>> @@ -2007,6 +2008,8 @@ static void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
>> debugfs_create_file(obj->fs_data.debugfs_name, S_IWUGO | S_IRUGO, dir,
>> obj, &amdgpu_ras_debugfs_ops);
>> +
>> + amdgpu_ras_set_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS);
>> }
>> static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev)
>> @@ -2134,7 +2137,9 @@ static int amdgpu_ras_fs_fini(struct amdgpu_device *adev)
>> if (IS_ENABLED(CONFIG_DEBUG_FS)) {
>> list_for_each_entry_safe(con_obj, tmp, &con->head, node) {
>> ip_obj = amdgpu_ras_find_obj(adev, &con_obj->head);
>> - if (ip_obj)
>> + if (ip_obj &&
>> + amdgpu_ras_test_and_clear_marker(adev, &ip_obj->head,
>> + AMDGPU_MARKER_RAS_DEBUGFS))
>> put_obj(ip_obj);
>> }
>> }
[-- Attachment #2: Type: text/html, Size: 13509 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status
2025-01-10 3:19 ` Gerry Liu
@ 2025-01-10 16:58 ` Mario Limonciello
2025-01-10 17:16 ` Alex Deucher
0 siblings, 1 reply; 27+ messages in thread
From: Mario Limonciello @ 2025-01-10 16:58 UTC (permalink / raw)
To: Gerry Liu, Koenig, Christian, Alex Deucher
Cc: Xinhui.Pan, airlied, simona, sunil.khatri, lijo.lazar,
Hawking.Zhang, Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu,
amd-gfx
On 1/9/2025 21:19, Gerry Liu wrote:
>
>
>> 2025年1月9日 01:19,Mario Limonciello <mario.limonciello@amd.com
>> <mailto:mario.limonciello@amd.com>> 写道:
>>
>> On 1/8/2025 07:59, Jiang Liu wrote:
>>> Add a flag to track ras debugfs creation status, to avoid possible
>>> incorrect reference count management for ras block object in function
>>> amdgpu_ras_aca_is_supported().
>>
>> Rather than taking a marker position, why not just check for
>> obj->fs_data.debugfs_name to be non NULL in amdgpu_ras_fs_fini()?
> I plan to use marker as a common status track mechanism, so used marker
> here:)
>
>>
>>> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com
>>> <mailto:gerry@linux.alibaba.com>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
>>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/
>>> amd/amdgpu/amdgpu.h
>>> index 32941f29507c..2ef7d3102be3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -378,7 +378,7 @@ enum amdgpu_marker {
>>> AMDGPU_MARKER_IRQ6= 6,
>>> AMDGPU_MARKER_IRQ7= 7,
>>> AMDGPU_MARKER_IRQ_MAX= 47,
>>> -AMDGPU_MARKER_DEBUGFS= 63,
>>> +AMDGPU_MARKER_RAS_DEBUGFS= 63,
>>
>> Any particular reason you jumped to 63 in this patch and then counted
>> down in the next one? IE why not throw it at 48 (and then 49 for next
>> one)?
> I’m not sure how much markers are needed for IRQ, so I split the space
> into two parts: one for irq and one for others.
I think it's up to Alex and Christian here but as it's all internal to
amdgpu we can always reassign the enum in the future.
So I think we should just take 48 and 49. If another IRQ is needed we
can slot it in at 48 and bump these by 1.
Alex, Christian, thoughts?
>
>>
>>> };
>>> #define AMDGPU_MARKER_INDEX_IRQ(idx)(AMDGPU_MARKER_INDEX_IRQ0 + (idx))
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/
>>> drm/amd/amdgpu/amdgpu_ras.c
>>> index 6d52e22691f7..efd72b07a185 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>> @@ -1996,7 +1996,8 @@ static void amdgpu_ras_debugfs_create(struct
>>> amdgpu_device *adev,
>>> {
>>> struct ras_manager *obj = amdgpu_ras_find_obj(adev, &head->head);
>>> -if (!obj || !dir)
>>> +if (!obj || !dir ||
>>> + amdgpu_ras_test_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS))
>>> return;
>>> get_obj(obj);
>>> @@ -2007,6 +2008,8 @@ static void amdgpu_ras_debugfs_create(struct
>>> amdgpu_device *adev,
>>> debugfs_create_file(obj->fs_data.debugfs_name, S_IWUGO | S_IRUGO, dir,
>>> obj, &amdgpu_ras_debugfs_ops);
>>> +
>>> +amdgpu_ras_set_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS);
>>> }
>>> static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev)
>>> @@ -2134,7 +2137,9 @@ static int amdgpu_ras_fs_fini(struct
>>> amdgpu_device *adev)
>>> if (IS_ENABLED(CONFIG_DEBUG_FS)) {
>>> list_for_each_entry_safe(con_obj, tmp, &con->head, node) {
>>> ip_obj = amdgpu_ras_find_obj(adev, &con_obj->head);
>>> -if (ip_obj)
>>> +if (ip_obj &&
>>> + amdgpu_ras_test_and_clear_marker(adev, &ip_obj->head,
>>> + AMDGPU_MARKER_RAS_DEBUGFS))
>>> put_obj(ip_obj);
>>> }
>>> }
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status
2025-01-10 16:58 ` Mario Limonciello
@ 2025-01-10 17:16 ` Alex Deucher
0 siblings, 0 replies; 27+ messages in thread
From: Alex Deucher @ 2025-01-10 17:16 UTC (permalink / raw)
To: Mario Limonciello
Cc: Gerry Liu, Koenig, Christian, Alex Deucher, Xinhui.Pan, airlied,
simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2,
xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx
On Fri, Jan 10, 2025 at 12:10 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 1/9/2025 21:19, Gerry Liu wrote:
> >
> >
> >> 2025年1月9日 01:19,Mario Limonciello <mario.limonciello@amd.com
> >> <mailto:mario.limonciello@amd.com>> 写道:
> >>
> >> On 1/8/2025 07:59, Jiang Liu wrote:
> >>> Add a flag to track ras debugfs creation status, to avoid possible
> >>> incorrect reference count management for ras block object in function
> >>> amdgpu_ras_aca_is_supported().
> >>
> >> Rather than taking a marker position, why not just check for
> >> obj->fs_data.debugfs_name to be non NULL in amdgpu_ras_fs_fini()?
> > I plan to use marker as a common status track mechanism, so used marker
> > here:)
> >
> >>
> >>> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com
> >>> <mailto:gerry@linux.alibaba.com>>
> >>> ---
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++++++--
> >>> 2 files changed, 8 insertions(+), 3 deletions(-)
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/
> >>> amd/amdgpu/amdgpu.h
> >>> index 32941f29507c..2ef7d3102be3 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> @@ -378,7 +378,7 @@ enum amdgpu_marker {
> >>> AMDGPU_MARKER_IRQ6= 6,
> >>> AMDGPU_MARKER_IRQ7= 7,
> >>> AMDGPU_MARKER_IRQ_MAX= 47,
> >>> -AMDGPU_MARKER_DEBUGFS= 63,
> >>> +AMDGPU_MARKER_RAS_DEBUGFS= 63,
> >>
> >> Any particular reason you jumped to 63 in this patch and then counted
> >> down in the next one? IE why not throw it at 48 (and then 49 for next
> >> one)?
> > I’m not sure how much markers are needed for IRQ, so I split the space
> > into two parts: one for irq and one for others.
>
> I think it's up to Alex and Christian here but as it's all internal to
> amdgpu we can always reassign the enum in the future.
>
> So I think we should just take 48 and 49. If another IRQ is needed we
> can slot it in at 48 and bump these by 1.
Why explicitly assign numbers to these? if we just use them as an
enum we can add enums in the future wherever we want in the sequence.
Alex
>
> Alex, Christian, thoughts?
>
> >
> >>
> >>> };
> >>> #define AMDGPU_MARKER_INDEX_IRQ(idx)(AMDGPU_MARKER_INDEX_IRQ0 + (idx))
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/
> >>> drm/amd/amdgpu/amdgpu_ras.c
> >>> index 6d52e22691f7..efd72b07a185 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >>> @@ -1996,7 +1996,8 @@ static void amdgpu_ras_debugfs_create(struct
> >>> amdgpu_device *adev,
> >>> {
> >>> struct ras_manager *obj = amdgpu_ras_find_obj(adev, &head->head);
> >>> -if (!obj || !dir)
> >>> +if (!obj || !dir ||
> >>> + amdgpu_ras_test_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS))
> >>> return;
> >>> get_obj(obj);
> >>> @@ -2007,6 +2008,8 @@ static void amdgpu_ras_debugfs_create(struct
> >>> amdgpu_device *adev,
> >>> debugfs_create_file(obj->fs_data.debugfs_name, S_IWUGO | S_IRUGO, dir,
> >>> obj, &amdgpu_ras_debugfs_ops);
> >>> +
> >>> +amdgpu_ras_set_marker(adev, &head->head, AMDGPU_MARKER_RAS_DEBUGFS);
> >>> }
> >>> static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev)
> >>> @@ -2134,7 +2137,9 @@ static int amdgpu_ras_fs_fini(struct
> >>> amdgpu_device *adev)
> >>> if (IS_ENABLED(CONFIG_DEBUG_FS)) {
> >>> list_for_each_entry_safe(con_obj, tmp, &con->head, node) {
> >>> ip_obj = amdgpu_ras_find_obj(adev, &con_obj->head);
> >>> -if (ip_obj)
> >>> +if (ip_obj &&
> >>> + amdgpu_ras_test_and_clear_marker(adev, &ip_obj->head,
> >>> + AMDGPU_MARKER_RAS_DEBUGFS))
> >>> put_obj(ip_obj);
> >>> }
> >>> }
> >
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re:
2025-01-09 17:10 ` Re: Mario Limonciello
@ 2025-01-13 1:19 ` Gerry Liu
2025-01-13 21:59 ` Re: Mario Limonciello
0 siblings, 1 reply; 27+ messages in thread
From: Gerry Liu @ 2025-01-13 1:19 UTC (permalink / raw)
To: Mario Limonciello
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang,
Kent.Russell, Shuo Liu, amd-gfx
> 2025年1月10日 01:10,Mario Limonciello <mario.limonciello@amd.com> 写道:
>
> General note - don't use HTML for mailing list communication.
>
> I'm not sure if Apple Mail lets you switch this around.
>
> If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all"
>
> For my reply I'll convert my reply to plain text, please see inline below.
>
> On 1/8/2025 23:34, Gerry Liu wrote:
>>> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com <mailto:mario.limonciello@amd.com>> 写道:
>>>
>>> On 1/8/2025 07:59, Jiang Liu wrote:
>>>> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
>>>
>>> I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread.
>> Maybe it’s caused by one extra blank line at the header.
>
> Yeah that might be it. Hopefully it doesn't happen on v2.
>
>>>
>>>> Recently we were testing suspend/resume functionality with AMD GPUs,
>>>> we have encountered several resource tracking related bugs, such as
>>>> double buffer free, use after free and unbalanced irq reference count.
>>>
>>> Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU?
>>> Are they only with SRIOV?
>>>
>>> Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs?
>>>
>>> I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume.
>> We are investigating to develop some advanced product features based on amdgpu suspend/resume.
>> So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script:
>> ```
>> echoplatform >/sys/power/pm_test
>> i=0
>> while true; do
>> echomem >/sys/power/state
>> leti=i+1
>> echo$i
>> sleep1
>> done
>> ```
>> It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs.
>
> Can you share more about this server? Does it support suspend to ram or a hardware backed suspend to idle? If you don't know, you can check like this:
>
> ❯ cat /sys/power/mem_sleep
> s2idle [deep]
# cat /sys/power/mem_sleep
[s2idle]
>
> If it's suspend to idle, what does the FACP indicate? You can do this check to find out if you don't know.
>
> ❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp
> ❯ sudo iasl -d /tmp/FACP
> ❯ grep "idle" -i /tmp/FACP.dsl
> Low Power S0 Idle (V5) : 0
>
With acpidump and `iasl -d facp.data`, we got:
[070h 0112 4] Flags (decoded below) : 000084A5
WBINVD instruction is operational (V1) : 1
WBINVD flushes all caches (V1) : 0
All CPUs support C1 (V1) : 1
C2 works on MP system (V1) : 0
Control Method Power Button (V1) : 0
Control Method Sleep Button (V1) : 1
RTC wake not in fixed reg space (V1) : 0
RTC can wake system from S4 (V1) : 1
32-bit PM Timer (V1) : 0
Docking Supported (V1) : 0
Reset Register Supported (V2) : 1
Sealed Case (V3) : 0
Headless - No Video (V3) : 0
Use native instr after SLP_TYPx (V3) : 0
PCIEXP_WAK Bits Supported (V4) : 0
Use Platform Timer (V4) : 1
RTC_STS valid on S4 wake (V4) : 0
Remote Power-on capable (V4) : 0
Use APIC Cluster Model (V4) : 0
Use APIC Physical Destination Mode (V4) : 0
Hardware Reduced (V5) : 0
Low Power S0 Idle (V5) : 0
>> With some investigation we found that the gpu asic should be reset during the test,
>
> Yeah; but this comes back to my above questions. Typically there is an assumption that the power rails are going to be cut in system suspend.
>
> If that doesn't hold true, then you're doing a pure software suspend and have found a series of issues in the driver with how that's handled.
Yeah, we are trying to do a `pure software suspend`, letting hypervisor to save/restore system images instead of guest OS.
And during the suspend process, we hope we can cancel the suspend request at any later stage.
We cancel suspend at late stages, it does behave like a pure software suspend.
>
>> so we submitted a patch to fix the failure (https:// github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ ROCK-Kernel-Driver/pull/181>)
>
> Typically kernel patches don't go through that repo, they're discussed on the mailing lists. Can you bring this patch for discussion on amd-gfx?
Will post to amd-gfx after solving the conflicts.
Regards,
Gerry
>
>> During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms.
>
> Yeah; I think you found some real issues.
>
>> So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025- January/118484.html <https://lists.freedesktop.org/archives/amd- gfx/2025-January/118484.html>
>
> Thanks!
>
>> With sriov in single VF mode, resume always fails. Seems some contexts/ vram buffers get lost during suspend and haven’t be restored on resume, so cause failure.
>> We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:)
>>>
>>>> We have tried to solve these issues case by case, but found that may
>>>> not be the right way. Especially about the unbalanced irq reference
>>>> count, there will be new issues appear once we fixed the current known
>>>> issues. After analyzing related source code, we found that there may be
>>>> some fundamental implementaion flaws behind these resource tracking
>>>
>>> implementation
>>>
>>>> issues.
>>>> The amdgpu driver has two major state machines to driver the device
>>>> management flow, one is for ip blocks, the other is for ras blocks.
>>>> The hook points defined in struct amd_ip_funcs for device setup/teardown
>>>> are symmetric, but the implementation is asymmetric, sometime even
>>>> ambiguous. The most obvious two issues we noticed are:
>>>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
>>>> are called from .hw_fini() instead of .early_fini().
>>>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
>>>> match the way to set those flags.
>>>> When taking device suspend/resume into account, in addition to device
>>>> probe/remove, things get much more complex. Some issues arise because
>>>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
>>>> .late_init hook points.
>>>>
>>>> So we try to fix those issues by two enhancements/refinements to current
>>>> device management state machines.
>>>> The first change is to make the ip block state machine and associated
>>>> status flags work in stack-like way as below:
>>>> Callback Status Flags
>>>> early_init: valid = true
>>>> sw_init: sw = true
>>>> hw_init: hw = true
>>>> late_init: late_initialized = true
>>>> early_fini: late_initialized = false
>>>> hw_fini: hw = false
>>>> sw_fini: sw = false
>>>> late_fini: valid = false
>>>
>>> At a high level this makes sense to me, but I'd just call 'late' or 'late_init'.
>>>
>>> Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable.
>> I will add a patch to convert those bool flags into an enum.
>
> Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re:
2025-01-13 1:19 ` Re: Gerry Liu
@ 2025-01-13 21:59 ` Mario Limonciello
0 siblings, 0 replies; 27+ messages in thread
From: Mario Limonciello @ 2025-01-13 21:59 UTC (permalink / raw)
To: Gerry Liu
Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona,
sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang,
Kent.Russell, Shuo Liu, amd-gfx
On 1/12/2025 19:19, Gerry Liu wrote:
>
>
>> 2025年1月10日 01:10,Mario Limonciello <mario.limonciello@amd.com> 写道:
>>
>> General note - don't use HTML for mailing list communication.
>>
>> I'm not sure if Apple Mail lets you switch this around.
>>
>> If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all"
>>
>> For my reply I'll convert my reply to plain text, please see inline below.
>>
>> On 1/8/2025 23:34, Gerry Liu wrote:
>>>> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com <mailto:mario.limonciello@amd.com>> 写道:
>>>>
>>>> On 1/8/2025 07:59, Jiang Liu wrote:
>>>>> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume
>>>>
>>>> I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread.
>>> Maybe it’s caused by one extra blank line at the header.
>>
>> Yeah that might be it. Hopefully it doesn't happen on v2.
>>
>>>>
>>>>> Recently we were testing suspend/resume functionality with AMD GPUs,
>>>>> we have encountered several resource tracking related bugs, such as
>>>>> double buffer free, use after free and unbalanced irq reference count.
>>>>
>>>> Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU?
>>>> Are they only with SRIOV?
>>>>
>>>> Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs?
>>>>
>>>> I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume.
>>> We are investigating to develop some advanced product features based on amdgpu suspend/resume.
>>> So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script:
>>> ```
>>> echoplatform >/sys/power/pm_test
>>> i=0
>>> while true; do
>>> echomem >/sys/power/state
>>> leti=i+1
>>> echo$i
>>> sleep1
>>> done
>>> ```
>>> It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs.
>>
>> Can you share more about this server? Does it support suspend to ram or a hardware backed suspend to idle? If you don't know, you can check like this:
>>
>> ❯ cat /sys/power/mem_sleep
>> s2idle [deep]
> # cat /sys/power/mem_sleep
> [s2idle]
>
>>
>> If it's suspend to idle, what does the FACP indicate? You can do this check to find out if you don't know.
>>
>> ❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp
>> ❯ sudo iasl -d /tmp/FACP
>> ❯ grep "idle" -i /tmp/FACP.dsl
>> Low Power S0 Idle (V5) : 0
>>
> With acpidump and `iasl -d facp.data`, we got:
> [070h 0112 4] Flags (decoded below) : 000084A5
> WBINVD instruction is operational (V1) : 1
> WBINVD flushes all caches (V1) : 0
> All CPUs support C1 (V1) : 1
> C2 works on MP system (V1) : 0
> Control Method Power Button (V1) : 0
> Control Method Sleep Button (V1) : 1
> RTC wake not in fixed reg space (V1) : 0
> RTC can wake system from S4 (V1) : 1
> 32-bit PM Timer (V1) : 0
> Docking Supported (V1) : 0
> Reset Register Supported (V2) : 1
> Sealed Case (V3) : 0
> Headless - No Video (V3) : 0
> Use native instr after SLP_TYPx (V3) : 0
> PCIEXP_WAK Bits Supported (V4) : 0
> Use Platform Timer (V4) : 1
> RTC_STS valid on S4 wake (V4) : 0
> Remote Power-on capable (V4) : 0
> Use APIC Cluster Model (V4) : 0
> Use APIC Physical Destination Mode (V4) : 0
> Hardware Reduced (V5) : 0
> Low Power S0 Idle (V5) : 0
>
>>> With some investigation we found that the gpu asic should be reset during the test,
>>
>> Yeah; but this comes back to my above questions. Typically there is an assumption that the power rails are going to be cut in system suspend.
>>
>> If that doesn't hold true, then you're doing a pure software suspend and have found a series of issues in the driver with how that's handled.
> Yeah, we are trying to do a `pure software suspend`, letting hypervisor to save/restore system images instead of guest OS.
> And during the suspend process, we hope we can cancel the suspend request at any later stage.
> We cancel suspend at late stages, it does behave like a pure software suspend.
>
Thanks; this all makes a lot more sense now. This isn't an area that
has a lot of coverage right now. Most suspend testing happens with the
power being cut and coming back fresh.
Will keep this in mind as reviewing your future iterations of your patches.
>>
>>> so we submitted a patch to fix the failure (https:// github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ ROCK-Kernel-Driver/pull/181>)
>>
>> Typically kernel patches don't go through that repo, they're discussed on the mailing lists. Can you bring this patch for discussion on amd-gfx?
> Will post to amd-gfx after solving the conflicts.
Thx!
>
> Regards,
> Gerry
>
>>
>>> During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms.
>>
>> Yeah; I think you found some real issues.
>>
>>> So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025- January/118484.html <https://lists.freedesktop.org/archives/amd- gfx/2025-January/118484.html>
>>
>> Thanks!
>>
>>> With sriov in single VF mode, resume always fails. Seems some contexts/ vram buffers get lost during suspend and haven’t be restored on resume, so cause failure.
>>> We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:)
>>>>
>>>>> We have tried to solve these issues case by case, but found that may
>>>>> not be the right way. Especially about the unbalanced irq reference
>>>>> count, there will be new issues appear once we fixed the current known
>>>>> issues. After analyzing related source code, we found that there may be
>>>>> some fundamental implementaion flaws behind these resource tracking
>>>>
>>>> implementation
>>>>
>>>>> issues.
>>>>> The amdgpu driver has two major state machines to driver the device
>>>>> management flow, one is for ip blocks, the other is for ras blocks.
>>>>> The hook points defined in struct amd_ip_funcs for device setup/teardown
>>>>> are symmetric, but the implementation is asymmetric, sometime even
>>>>> ambiguous. The most obvious two issues we noticed are:
>>>>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put()
>>>>> are called from .hw_fini() instead of .early_fini().
>>>>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't
>>>>> match the way to set those flags.
>>>>> When taking device suspend/resume into account, in addition to device
>>>>> probe/remove, things get much more complex. Some issues arise because
>>>>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/
>>>>> .late_init hook points.
>>>>>
>>>>> So we try to fix those issues by two enhancements/refinements to current
>>>>> device management state machines.
>>>>> The first change is to make the ip block state machine and associated
>>>>> status flags work in stack-like way as below:
>>>>> Callback Status Flags
>>>>> early_init: valid = true
>>>>> sw_init: sw = true
>>>>> hw_init: hw = true
>>>>> late_init: late_initialized = true
>>>>> early_fini: late_initialized = false
>>>>> hw_fini: hw = false
>>>>> sw_fini: sw = false
>>>>> late_fini: valid = false
>>>>
>>>> At a high level this makes sense to me, but I'd just call 'late' or 'late_init'.
>>>>
>>>> Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable.
>>> I will add a patch to convert those bool flags into an enum.
>>
>> Thanks!
>
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-01-13 21:59 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 13:59 Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 01/13] amdgpu: wrong array index to get ip block for PSP Jiang Liu
2025-01-08 15:02 ` Alex Deucher
2025-01-08 13:59 ` [RFC PATCH 02/13] drm/admgpu: add helper functions to track status for ras manager Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status Jiang Liu
2025-01-08 17:19 ` Mario Limonciello
2025-01-10 3:19 ` Gerry Liu
2025-01-10 16:58 ` Mario Limonciello
2025-01-10 17:16 ` Alex Deucher
2025-01-08 13:59 ` [RFC PATCH 04/13] drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init() Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 05/13] drm/amdgpu: introduce a flag to track refcount held for features Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 06/13] drm/amdgpu: enhance amdgpu_ras_block_late_fini() Jiang Liu
2025-01-08 13:59 ` [RFC PATCH 07/13] drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 08/13] drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 09/13] drm/amdgpu: make IP block state machine works in stack like way Jiang Liu
2025-01-08 17:04 ` Mario Limonciello
2025-01-08 14:00 ` [RFC PATCH 10/13] drm/admgpu: make device state machine work " Jiang Liu
2025-01-08 17:13 ` Mario Limonciello
2025-01-08 14:00 ` [RFC PATCH 11/13] drm/amdgpu/sdma: improve the way to manage irq reference count Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 12/13] drm/amdgpu/nbio: " Jiang Liu
2025-01-08 14:00 ` [RFC PATCH 13/13] drm/amdgpu/asic: make ip block operations symmetric by .early_fini() Jiang Liu
2025-01-08 14:10 ` Christian König
2025-01-08 16:33 ` Re: Mario Limonciello
2025-01-09 5:34 ` Re: Gerry Liu
2025-01-09 17:10 ` Re: Mario Limonciello
2025-01-13 1:19 ` Re: Gerry Liu
2025-01-13 21:59 ` Re: Mario Limonciello
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).