AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Nirujogi, Pratap" <pnirujog@amd.com>
To: Mario Limonciello <mario.limonciello@amd.com>,
	Pratap Nirujogi <pratap.nirujogi@amd.com>,
	amd-gfx@lists.freedesktop.org, mlimonci@amd.com,
	alexander.deucher@amd.com, christian.koenig@amd.com
Cc: benjamin.chan@amd.com, bin.du@amd.com,
	gjorgji.rosikopulos@amd.com, king.li@amd.com, dantony@amd.com,
	phil.jawich@amd.com, Gjorgji Rosikopulos <grosikop@amd.com>
Subject: Re: [PATCH] drm/amd/amdgpu: Fix SMU warning during isp suspend-resume
Date: Wed, 10 Dec 2025 23:01:09 -0500	[thread overview]
Message-ID: <772c2e4c-86b6-4200-9cbf-696e4652a6c0@amd.com> (raw)
In-Reply-To: <03db9561-f7e4-4157-95b3-7d32bdafe15f@amd.com>


On 12/10/2025 9:19 PM, Mario Limonciello wrote:
>
>
> On 12/10/2025 5:13 PM, Nirujogi, Pratap wrote:
>> Hi Mario,
>>
>> On 12/9/2025 10:28 PM, Mario Limonciello wrote:
>>>
>>>
>>> On 12/9/2025 7:50 PM, Pratap Nirujogi wrote:
>>>> ISP mfd child devices are using genpd and the system suspend-resume
>>>> operations between genpd and amdgpu parent device which uses only
>>>> runtime suspend-resume are not in sync.
>>>>
>>>> Linux power manager during suspend-resume resuming the genpd devices
>>>> earlier than the amdgpu parent device. This is resulting in the below
>>>> warning as SMU is in suspended state when genpd attempts to resume 
>>>> ISP.
>>>>
>>>> WARNING: CPU: 13 PID: 5435 at 
>>>> drivers/gpu/drm/amd/amdgpu/../pm/swsmu/ amdgpu_smu.c:398 
>>>> smu_dpm_set_power_gate+0x36f/0x380 [amdgpu]
>>>>
>>>> To fix this warning isp suspend-resume is handled as part of amdgpu
>>>> parent device suspend-resume instead of genpd sequence. Each ISP MFD
>>>> child device is marked as dev_pm_syscore_device to skip genpd
>>>> suspend-resume and use pm_runtime_force api's to suspend-resume
>>>> the devices when callbacks from amdgpu are received.
>>>>
>>>> Signed-off-by: Gjorgji Rosikopulos <grosikop@amd.com>
>>>> Signed-off-by: Bin Du <bin.du@amd.com>
>>>> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
>>>
>>> Who is the patch author?  If you guys worked together, there should 
>>> be Co-developed-by tags to represent it.
>>>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 24 ++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h |  2 +
>>>>   drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c | 59 
>>>> +++++++++++++++++++++++++
>>>>   3 files changed, 85 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c b/drivers/gpu/ 
>>>> drm/amd/amdgpu/amdgpu_isp.c
>>>> index 37270c4dab8d..532f83d783d1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
>>>> @@ -318,12 +318,36 @@ void isp_kernel_buffer_free(void **buf_obj, 
>>>> u64 *gpu_addr, void **cpu_addr)
>>>>   }
>>>>   EXPORT_SYMBOL(isp_kernel_buffer_free);
>>>>   +static int isp_resume(struct amdgpu_ip_block *ip_block)
>>>> +{
>>>> +    struct amdgpu_device *adev = ip_block->adev;
>>>> +    struct amdgpu_isp *isp = &adev->isp;
>>>> +
>>>> +    if (isp->funcs->hw_resume)
>>>> +        return isp->funcs->hw_resume(isp);
>>>> +
>>>> +    return -ENODEV;
>>>> +}
>>>> +
>>>> +static int isp_suspend(struct amdgpu_ip_block *ip_block)
>>>> +{
>>>> +    struct amdgpu_device *adev = ip_block->adev;
>>>> +    struct amdgpu_isp *isp = &adev->isp;
>>>> +
>>>> +    if (isp->funcs->hw_suspend)
>>>> +        return isp->funcs->hw_suspend(isp);
>>>> +
>>>> +    return -ENODEV;
>>>> +}
>>>> +
>>>>   static const struct amd_ip_funcs isp_ip_funcs = {
>>>>       .name = "isp_ip",
>>>>       .early_init = isp_early_init,
>>>>       .hw_init = isp_hw_init,
>>>>       .hw_fini = isp_hw_fini,
>>>>       .is_idle = isp_is_idle,
>>>> +    .suspend = isp_suspend,
>>>> +    .resume = isp_resume,
>>>>       .set_clockgating_state = isp_set_clockgating_state,
>>>>       .set_powergating_state = isp_set_powergating_state,
>>>>   };
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h b/drivers/gpu/ 
>>>> drm/amd/amdgpu/amdgpu_isp.h
>>>> index d6f4ffa4c97c..9a5d2b1dff9e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h
>>>> @@ -38,6 +38,8 @@ struct amdgpu_isp;
>>>>   struct isp_funcs {
>>>>       int (*hw_init)(struct amdgpu_isp *isp);
>>>>       int (*hw_fini)(struct amdgpu_isp *isp);
>>>> +    int (*hw_suspend)(struct amdgpu_isp *isp);
>>>> +    int (*hw_resume)(struct amdgpu_isp *isp);
>>>>   };
>>>>     struct amdgpu_isp {
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c b/drivers/gpu/ 
>>>> drm/amd/amdgpu/isp_v4_1_1.c
>>>> index 4258d3e0b706..560c398e14fc 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c
>>>> @@ -26,6 +26,7 @@
>>>>    */
>>>>     #include <linux/gpio/machine.h>
>>>> +#include <linux/pm_runtime.h>
>>>>   #include "amdgpu.h"
>>>>   #include "isp_v4_1_1.h"
>>>>   @@ -145,6 +146,9 @@ static int isp_genpd_add_device(struct device 
>>>> *dev, void *data)
>>>>           return -ENODEV;
>>>>       }
>>>>   +    /* The devcies will be managed by the pm ops from the parent */
>>>
>>> devices
>>>
>>>> +    dev_pm_syscore_device(dev, true);
>>>> +
>>>>   exit:
>>>>       /* Continue to add */
>>>>       return 0;
>>>> @@ -177,12 +181,65 @@ static int isp_genpd_remove_device(struct 
>>>> device *dev, void *data)
>>>>           drm_err(&adev->ddev, "Failed to remove dev from genpd 
>>>> %d\n", ret);
>>>>           return -ENODEV;
>>>>       }
>>>> +    dev_pm_syscore_device(dev, false);
>>>>     exit:
>>>>       /* Continue to remove */
>>>>       return 0;
>>>>   }
>>>>   +static int isp_suspend_device(struct device *dev, void *data)
>>>> +{
>>>> +    struct platform_device *pdev = container_of(dev, struct 
>>>> platform_device, dev);
>>>> +
>>>> +    if (!dev->type || !dev->type->name)
>>>> +        return 0;
>>>> +    if (strncmp(dev->type->name, "mfd_device", 10))
>>>> +        return 0;
>>>> +    if (!strncmp(pdev->mfd_cell->name, "amdisp-pinctrl", 14))
>>>> +        return 0;
>>>
>>> Could we store the mfd_cell pointer instead and just compare the 
>>> pointers?
>>
>> I don't think I can do a pointer comparision to identify the correct 
>> mfd_cell, string comparision seems like required in this case.
>>
>> Its because when isp mfd child devices are created using 
>> mfd_add_hotplug_devices(), it is not returning the pdev or mfd_cell 
>> handles
>> to store in the amdgpu_isp and later use in suspend/resume to compare 
>> with incoming pdev->mfd_cell to detect the correct the device.
>>
>> The mfd-core is doing a kmemdup of mfd_cells data passed to 
>> mfd_add_hotplug_devices() to create the platform device.
>>
>> https://github.com/torvalds/linux/blob/master/drivers/mfd/mfd-core.c#L163 
>>
>>
>> I'm considering to add this function to check for valid isp mfd child 
>> devices that are allowed to do suspend-resume, this can minimize the 
>> checks, but still cannot eliminate the string comparsion, please let 
>> us know your thoughts.
>
> Well actually is a check needed at all?
>
> isp_v4_1_1_hw_suspend() calls device_for_each_child(isp->parent) which 
> is a platform device. Are there other children below that platform 
> device?
>
> The platform device was made explicitly for this purpose wasn't it?  
> So maybe drop the check overall?

yes, there are more children than the 3 isp mfd devices. Below is the list:

child-1: mfd_device (amd_isp_capture)
child-2: mfd_device (amd_isp_i2c_designware)
child-3: mfd_device (amdisp-pinctrl)
child-4: drm_minor
child-5: drm_minor

Without the check, suspend-resume will be called for every child of amdgpu.

We intend to call suspend-resume only for child-1 and 2 as these are the 
only devices registered with genpd to control ISP power.

I did a quick test removing the checks, device failed to wake up from 
resume, I can recheck and update if it is safe to remove the checks.

Thanks,

Pratap

>
>>
>> static bool is_valid_mfd_device(struct platform_device *pdev)
>> {
>>      const struct mfd_cell *mc = mfd_get_cell(pdev);
>>      if (!mc)
>>          return false;
>>      if (!strncmp(mc->name, "amdisp-pinctrl", 14))
>>          return false;
>>      return true;
>> }
>>
>> Thanks,
>>
>> Pratap
>>
>>>
>>>> +
>>>> +    return pm_runtime_force_suspend(dev);
>>>> +}
>>>> +
>>>> +static int isp_resume_device(struct device *dev, void *data)
>>>> +{
>>>> +    struct platform_device *pdev = container_of(dev, struct 
>>>> platform_device, dev);
>>>> +
>>>> +    if (!dev->type || !dev->type->name)
>>>> +        return 0;
>>>> +    if (strncmp(dev->type->name, "mfd_device", 10))
>>>> +        return 0;
>>>> +    if (!strncmp(pdev->mfd_cell->name, "amdisp-pinctrl", 14))
>>>> +        return 0;
>>>
>>> same comment as above
>>>
>>>> +
>>>> +    return pm_runtime_force_resume(dev);
>>>> +}
>>>> +
>>>> +static int isp_v4_1_1_hw_suspend(struct amdgpu_isp *isp)
>>>> +{
>>>> +    int r;
>>>> +
>>>> +    r = device_for_each_child(isp->parent, NULL,
>>>> +                  isp_suspend_device);
>>>> +    if (r)
>>>> +        dev_err(isp->parent, "failed to suspend hw devices 
>>>> (%d)\n", r);
>>>> +
>>>> +    return 0;
>>>
>>> Shouldn't you return r?
>>>
>>>> +}
>>>> +
>>>> +static int isp_v4_1_1_hw_resume(struct amdgpu_isp *isp)
>>>> +{
>>>> +    int r;
>>>> +
>>>> +    r = device_for_each_child(isp->parent, NULL,
>>>> +                  isp_resume_device);
>>>> +    if (r)
>>>> +        dev_err(isp->parent, "failed to resume hw device (%d)\n", r);
>>>> +
>>>> +    return 0;
>>>
>>> Shouldn't you return r?
>>>
>>>> +}
>>>> +
>>>>   static int isp_v4_1_1_hw_init(struct amdgpu_isp *isp)
>>>>   {
>>>>       const struct software_node *amd_camera_node, *isp4_node;
>>>> @@ -369,6 +426,8 @@ static int isp_v4_1_1_hw_fini(struct amdgpu_isp 
>>>> *isp)
>>>>   static const struct isp_funcs isp_v4_1_1_funcs = {
>>>>       .hw_init = isp_v4_1_1_hw_init,
>>>>       .hw_fini = isp_v4_1_1_hw_fini,
>>>> +    .hw_suspend = isp_v4_1_1_hw_suspend,
>>>> +    .hw_resume = isp_v4_1_1_hw_resume,
>>>>   };
>>>>     void isp_v4_1_1_set_isp_funcs(struct amdgpu_isp *isp)
>>>
>

  reply	other threads:[~2025-12-11  4:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  1:50 [PATCH] drm/amd/amdgpu: Fix SMU warning during isp suspend-resume Pratap Nirujogi
2025-12-10  3:28 ` Mario Limonciello
2025-12-10 15:27   ` Nirujogi, Pratap
2025-12-10 23:13   ` Nirujogi, Pratap
2025-12-11  2:19     ` Mario Limonciello
2025-12-11  4:01       ` Nirujogi, Pratap [this message]
2025-12-11 18:44         ` Nirujogi, Pratap
2025-12-11 14:33     ` Alex Deucher
2025-12-11 19:31       ` Nirujogi, Pratap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=772c2e4c-86b6-4200-9cbf-696e4652a6c0@amd.com \
    --to=pnirujog@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=benjamin.chan@amd.com \
    --cc=bin.du@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dantony@amd.com \
    --cc=gjorgji.rosikopulos@amd.com \
    --cc=grosikop@amd.com \
    --cc=king.li@amd.com \
    --cc=mario.limonciello@amd.com \
    --cc=mlimonci@amd.com \
    --cc=phil.jawich@amd.com \
    --cc=pratap.nirujogi@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox