All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] enable xgmi node migration support for hibernate on SRIOV.
@ 2025-05-22  3:43 Samuel Zhang
  2025-05-22  3:43 ` [PATCH v7 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume Samuel Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Samuel Zhang @ 2025-05-22  3:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: victor.zhao, haijun.chang, guoqing.zhang, Christian.Koenig,
	Alexander.Deucher, Owen.Zhang2, Qing.Ma, Lijo.Lazar, Emily.Deng

On SRIOV and VM environment, customer may need to switch to new vGPU indexes after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will change in this case, the FB aperture gpu address of VRAM BOs will also change.
These gpu addresses need to be updated when resume. But these addresses are all over the KMD codebase, updating each of them is error-prone and not acceptable.

The solution is to use pdb0 page table to cover both vram and gart memory and use pdb0 virtual gpu address instead. When gpu indexes change, the virtual gpu address won't change.

For psp and smu, pdb0's gpu address does not work, so the original FB aperture gpu address is used instead. They need to be updated when resume with changed vGPUs.

v2:
- remove physical_node_id_changed
- set vram_start to 0 to switch cached gpu addr to gart aperture
- cleanup pdb0 patch
v3:
- remove gmc_v9_0_init_sw_mem_ranges() call
- remove vram_offset memeber
- add 4 refactoring patch to remove cached gpu addr
- cleanup pdb0 patch
v4:
- remove gmc_v9_0_mc_init() call and `refresh` update.
- do not set `fb_start` in mmhub_v1_8_get_fb_location() when pdb0 enabled.
v5:
- add amdgpu_virt_xgmi_migrate_enabled() check
- move vram_base_offset update to pdb0 patch
- remove 4 refactoring patches to remove cached gpu addr
- add patch to fix IH not working issue when resume with new VF
v6: per Lijo feedback
- rename amdgpu_device_update_xgmi_info() to amdgpu_virt_resume()
- merge xgmi node and vram_base_offset update, IH fix into amdgpu_virt_resume()
- remove 2 unnecessary gpu addr update changes
v7: per Christian feedback
- remove pdb0_enabled and add gmc_v9_0_is_pdb0_enabled()
- remove amdgpu_gmc_vram_location() call in amdgpu_gmc_sysvm_location()
- remove check in mmhub_v1_8_get_fb_location() and update fb_start/fb_end on resume

Samuel Zhang (4):
  drm/amdgpu: update xgmi info and vram_base_offset on resume
  drm/amdgpu: update GPU addresses for SMU and PSP
  drm/amdgpu: enable pdb0 for hibernation on SRIOV
  drm/amdgpu: fix fence fallback timer expired error

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c    | 23 +++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 20 ++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    | 23 +++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  7 ++++
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      | 13 +++++---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 18 ++++++++++
 12 files changed, 143 insertions(+), 8 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-05-22  9:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  3:43 [PATCH v7 0/4] enable xgmi node migration support for hibernate on SRIOV Samuel Zhang
2025-05-22  3:43 ` [PATCH v7 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume Samuel Zhang
2025-05-22  7:07   ` Lazar, Lijo
2025-05-22  7:51     ` Lazar, Lijo
2025-05-22  3:43 ` [PATCH v7 2/4] drm/amdgpu: update GPU addresses for SMU and PSP Samuel Zhang
2025-05-22  7:44   ` Lazar, Lijo
2025-05-22  9:09   ` Christian König
2025-05-22  3:43 ` [PATCH v7 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV Samuel Zhang
2025-05-22  7:46   ` Lazar, Lijo
2025-05-22  9:25   ` Christian König
2025-05-22  3:43 ` [PATCH v7 4/4] drm/amdgpu: fix fence fallback timer expired error Samuel Zhang
2025-05-22  7:48   ` Lazar, Lijo
2025-05-22  9:27   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.