Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Zhang, GuoQing (Sam)" <GuoQing.Zhang@amd.com>
To: "Koenig, Christian" <Christian.Koenig@amd.com>,
	"Zhang, GuoQing (Sam)" <GuoQing.Zhang@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Zhao, Victor" <Victor.Zhao@amd.com>,
	"Chang, HaiJun" <HaiJun.Chang@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Zhang, Owen(SRDC)" <Owen.Zhang2@amd.com>,
	"Ma, Qing (Mark)" <Qing.Ma@amd.com>,
	"Lazar, Lijo" <Lijo.Lazar@amd.com>,
	"Deng, Emily" <Emily.Deng@amd.com>
Subject: Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV
Date: Wed, 21 May 2025 11:55:27 +0000	[thread overview]
Message-ID: <8b2ea507-403c-471a-a3d4-db23f3d2f096@amd.com> (raw)
In-Reply-To: <67fc5bc0-e1a7-4074-ab4e-bd7735b60cd1@amd.com>

[-- Attachment #1: Type: text/plain, Size: 6234 bytes --]


On 2025/5/21 16:06, Christian König wrote:
> On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote:
>>>> +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) {
>>>> +            /* set mc->vram_start to 0 to switch the returned GPU address of
>>>> +             * amdgpu_bo_create_reserved() from FB aperture to GART aperture.
>>>> +             */
>>>> +            amdgpu_gmc_vram_location(adev, mc, 0);
>>> This function does a lot more than just setting mc->vram_start and mc->vram_end.
>>>
>>> You should probably just update the two setting and not call amdgpu_gmc_vram_location() at all.
>> I tried only setting mc->vram_start and mc->vram_end. But KMD load will
>> fail with following error logs.
>>
>> [  329.314346] amdgpu 0000:09:00.0: amdgpu: VRAM: 196288M
>> 0x0000000000000000 - 0x0000002FEBFFFFFF (196288M used)
>> [  329.314348] amdgpu 0000:09:00.0: amdgpu: GART: 512M
>> 0x0000018000000000 - 0x000001801FFFFFFF
>> [  329.314385] [drm] Detected VRAM RAM=196288M, BAR=262144M
>> [  329.314386] [drm] RAM width 8192bits HBM
>> [  329.314546] amdgpu 0000:09:00.0: amdgpu: (-22) failed to allocate
>> kernel bo
>> [  329.315013] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP
>> block <gmc_v9_0> failed -22
>> [  329.315846] amdgpu 0000:09:00.0: amdgpu: amdgpu_device_ip_init failed
>>
>>
>> It seems like setting mc->visible_vram_size and mc->visible_vram_size
>> fields are also needed. In this case call amdgpu_gmc_vram_location() is
>> better than inline the logic, I think.
> Yeah, exactly that is not a good idea.
>
> The mc->visible_vram_size and mc->real_vram_size should have been initialized by gmc_v9_0_mc_init(). Why didn't that happen?


[Sam] visible_vram_size is set to 0x4000000000 (256G) from
`pci_resource_len(adev->pdev, 0)` in `gmc_v9_0_mc_init()`.
It is set to real_vram_size 0x2fec000000(192G) in
amdgpu_gmc_vram_location().

Should I update the 3 variables inline and not call
amdgpu_gmc_vram_location()?

         mc->vram_start = 0;
         mc->vram_end = mc->vram_start + mc->mc_vram_size - 1;
         if (mc->real_vram_size < mc->visible_vram_size)
             mc->visible_vram_size = mc->real_vram_size;


>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>>> index 84cde1239ee4..18e80aa78aff 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
>>>> @@ -45,8 +45,10 @@ static u64 mmhub_v1_8_get_fb_location(struct amdgpu_device *adev)
>>>>         top &= MC_VM_FB_LOCATION_TOP__FB_TOP_MASK;
>>>>         top <<= 24;
>>>>
>>>> -    adev->gmc.fb_start = base;
>>>> -    adev->gmc.fb_end = top;
>>>> +    if (!amdgpu_virt_xgmi_migrate_enabled(adev)) {
>>>> +            adev->gmc.fb_start = base;
>>>> +            adev->gmc.fb_end = top;
>>>> +    }
>>> We should probably avoid calling this in the first place.
>>>
>>> The function gmc_v9_0_vram_gtt_location() should probably be adjusted.
>> mmhub_v1_8_get_fb_location() is called by the new
>> amdgpu_bo_fb_aper_addr() as well, not just gmc_v9_0_vram_gtt_location().
> Oh, that is probably a bad idea. The function amdgpu_bo_fb_aper_addr() should only rely on cached data.


[Sam] Can I add new `fb_base` field in `struct amdgpu_gmc` to cache the
value of `get_fb_location()`?
Using this approach, we don't need to set fb_start and fb_end on resume
any more, since the reset of the 2 field is caused by
mmhub_v1_8_get_fb_location() calls from amdgpu_bo_fb_aper_addr().
Please see the code change below.

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -259,6 +259,7 @@ struct amdgpu_gmc {
          */
         u64                     fb_start;
         u64                     fb_end;
+       u64                     fb_base;
         unsigned                vram_width;
         u64                     real_vram_size;
         int                     vram_mtrr;

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1527,7 +1527,7 @@ u64 amdgpu_bo_fb_aper_addr(struct amdgpu_bo *bo)

         WARN_ON_ONCE(bo->tbo.resource->mem_type != TTM_PL_VRAM);

-       fb_base = adev->mmhub.funcs->get_fb_location(adev);
+       fb_base = adev->gmc.fb_base;
         fb_base += adev->gmc.xgmi.physical_node_id *
adev->gmc.xgmi.node_segment_size;
         offset = (bo->tbo.resource->start << PAGE_SHIFT) + fb_base;
         return amdgpu_gmc_sign_extend(offset);

--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1728,6 +1728,7 @@ static void gmc_v9_0_vram_gtt_location(struct
amdgpu_device *adev,
                                         struct amdgpu_gmc *mc)
  {
         u64 base = adev->mmhub.funcs->get_fb_location(adev);
+       mc->fb_base = base;

         /* add the xgmi offset of the physical node */
         base += adev->gmc.xgmi.physical_node_id *
adev->gmc.xgmi.node_segment_size;

--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
@@ -45,10 +45,8 @@ static u64 mmhub_v1_8_get_fb_location(struct
amdgpu_device *adev)
         top &= MC_VM_FB_LOCATION_TOP__FB_TOP_MASK;
         top <<= 24;

-       if (!amdgpu_virt_xgmi_migrate_enabled(adev)) {
-               adev->gmc.fb_start = base;
-               adev->gmc.fb_end = top;
-       }
+       adev->gmc.fb_start = base;
+       adev->gmc.fb_end = top;


Regards
Sam


>
>> mmhub_v1_8_get_fb_location() is supposed to be a query api according to
>> its name. having such side effect is very surprising.
>>
>> Another approach is set the right fb_start and fb_end in the new
>> amdgpu_virt_resume(), like updating vram_base_offset.
> That is probably better. And skip setting fb_start and fb_end in amdgpu_gmc_sysvm_location() for this use case.
>
> That was done only because we re-program those registers on bare metal.
>
> Regards,
> Christian.
>
>> Which approach do you prefer? Or any better suggestions? Thank you.
>>
>>
>> Regards
>> Sam
>>
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>         return base;
>>>>     }

[-- Attachment #2: Type: text/html, Size: 11295 bytes --]

next prev parent reply	other threads:[~2025-05-21 11:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19  8:20 [PATCH v6 0/4] enable xgmi node migration support for hibernate on SRIOV Samuel Zhang
2025-05-19  8:20 ` [PATCH v6 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume Samuel Zhang
2025-05-19 13:41   ` Christian König
2025-05-19  8:20 ` [PATCH v6 2/4] drm/amdgpu: update GPU addresses for SMU and PSP Samuel Zhang
2025-05-19 13:43   ` Christian König
2025-05-19  8:20 ` [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV Samuel Zhang
2025-05-19 13:57   ` Christian König
2025-05-20  5:10     ` Zhang, GuoQing (Sam)
2025-05-21  7:33       ` Zhang, Owen(SRDC)
2025-05-21  8:06       ` Christian König
2025-05-21 11:55         ` Zhang, GuoQing (Sam) [this message]
2025-05-21 12:00           ` Christian König
2025-05-22  3:49             ` Zhang, GuoQing (Sam)
2025-05-19  8:20 ` [PATCH v6 4/4] drm/amdgpu: fix fence fallback timer expired error Samuel Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b2ea507-403c-471a-a3d4-db23f3d2f096@amd.com \
    --to=guoqing.zhang@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Emily.Deng@amd.com \
    --cc=HaiJun.Chang@amd.com \
    --cc=Lijo.Lazar@amd.com \
    --cc=Owen.Zhang2@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=Victor.Zhao@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.