public inbox for linux-tegra@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
       [not found] ` <22630663.EfDdHjke4D@rjwysocki.net>
@ 2025-05-01  9:51   ` Jon Hunter
  2025-05-02 20:33     ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Jon Hunter @ 2025-05-01  9:51 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Alan Stern, Ulf Hansson, Johan Hovold,
	Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

Hi Rafael,

On 14/03/2025 12:50, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> According to [1], the handling of device suspend and resume, and
> particularly the latter, involves unnecessary overhead related to
> starting new async work items for devices that cannot make progress
> right away because they have to wait for other devices.
> 
> To reduce this problem in the resume path, use the observation that
> starting the async resume of the children of a device after resuming
> the parent is likely to produce less scheduling and memory management
> noise than starting it upfront while at the same time it should not
> increase the resume duration substantially.
> 
> Accordingly, modify the code to start the async resume of the device's
> children when the processing of the parent has been completed in each
> stage of device resume and only start async resume upfront for devices
> without parents.
> 
> Also make it check if a given device can be resumed asynchronously
> before starting the synchronous resume of it in case it will have to
> wait for another that is already resuming asynchronously.
> 
> In addition to making the async resume of devices more friendly to
> systems with relatively less computing resources, this change is also
> preliminary for analogous changes in the suspend path.
> 
> On the systems where it has been tested, this change by itself does
> not affect the overall system resume duration in a measurable way.
> 
> Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> Suggested-by: Saravana Kannan <saravanak@google.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


I have noticed a suspend regression with -next on a couple of our Tegra 
boards. Bisect was pointing to the following merge commit ...

# first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge 
branch 'pm-sleep' into linux-next

On top of next-20250429 I found that by reverting the following changes 
that suspend is working again ...

Revert "PM: sleep: Resume children after resuming the parent"
Revert "PM: sleep: Suspend async parents after suspending children"
Revert "PM: sleep: Make suspend of devices more asynchronous"

I have been looking into this a bit more to see what device is failing 
and by adding a bit of debug I found that entry to suspend was failing 
on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform 
when one of the I2C controllers (i2c@c240000) was being suspended.

I found that if I disable only this I2C controller in device-tree 
suspend worked again on top of -next. This I2C controller has 3 devices 
on the platform; two ina3221 devices and one Cypress Type-C controller. 
I then found that removing only the two ina3221 devices (in 
tegra194-p2888.dtsi) also allows suspend to work.

At this point, I am still unclear why this is now failing.  If you have 
any thoughts or things I can try please let me know.

Thanks!
Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-01  9:51   ` [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent Jon Hunter
@ 2025-05-02 20:33     ` Rafael J. Wysocki
  2025-05-07 13:21       ` Jon Hunter
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-02 20:33 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

Hi Jon,

On Thu, May 1, 2025 at 11:51 AM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Rafael,
>
> On 14/03/2025 12:50, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > According to [1], the handling of device suspend and resume, and
> > particularly the latter, involves unnecessary overhead related to
> > starting new async work items for devices that cannot make progress
> > right away because they have to wait for other devices.
> >
> > To reduce this problem in the resume path, use the observation that
> > starting the async resume of the children of a device after resuming
> > the parent is likely to produce less scheduling and memory management
> > noise than starting it upfront while at the same time it should not
> > increase the resume duration substantially.
> >
> > Accordingly, modify the code to start the async resume of the device's
> > children when the processing of the parent has been completed in each
> > stage of device resume and only start async resume upfront for devices
> > without parents.
> >
> > Also make it check if a given device can be resumed asynchronously
> > before starting the synchronous resume of it in case it will have to
> > wait for another that is already resuming asynchronously.
> >
> > In addition to making the async resume of devices more friendly to
> > systems with relatively less computing resources, this change is also
> > preliminary for analogous changes in the suspend path.
> >
> > On the systems where it has been tested, this change by itself does
> > not affect the overall system resume duration in a measurable way.
> >
> > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> > Suggested-by: Saravana Kannan <saravanak@google.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
>
> I have noticed a suspend regression with -next on a couple of our Tegra
> boards. Bisect was pointing to the following merge commit ...
>
> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge
> branch 'pm-sleep' into linux-next
>
> On top of next-20250429 I found that by reverting the following changes
> that suspend is working again ...
>
> Revert "PM: sleep: Resume children after resuming the parent"
> Revert "PM: sleep: Suspend async parents after suspending children"
> Revert "PM: sleep: Make suspend of devices more asynchronous"

I see.

Do all three commits need to be reverted to make things work again?
The first one only touches the resume path, so it would be surprising
if it caused a suspend regression to occur.

The most likely commit to cause this issue to happen is the second one
because it effectively changes the suspend ordering for "async"
devices.

> I have been looking into this a bit more to see what device is failing
> and by adding a bit of debug I found that entry to suspend was failing
> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform
> when one of the I2C controllers (i2c@c240000) was being suspended.
>
> I found that if I disable only this I2C controller in device-tree
> suspend worked again on top of -next. This I2C controller has 3 devices
> on the platform; two ina3221 devices and one Cypress Type-C controller.
> I then found that removing only the two ina3221 devices (in
> tegra194-p2888.dtsi) also allows suspend to work.
>
> At this point, I am still unclear why this is now failing.  If you have
> any thoughts or things I can try please let me know.

So are the devices in question "async"?  To check this, please see the
"async" attribute in the "power" subdirectory of the sysfs device
directory for each of them.

If they are "async", you can write "disable" to this attribute to turn
them into "sync" devices.  I'd do this and see what happens.

Overall, it looks like some dependencies aren't properly represented
by device links on this platform.

Thanks, Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-02 20:33     ` Rafael J. Wysocki
@ 2025-05-07 13:21       ` Jon Hunter
  2025-05-07 13:39         ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Jon Hunter @ 2025-05-07 13:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

Hi Rafael,

On 02/05/2025 21:33, Rafael J. Wysocki wrote:

...

>> I have noticed a suspend regression with -next on a couple of our Tegra
>> boards. Bisect was pointing to the following merge commit ...
>>
>> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge
>> branch 'pm-sleep' into linux-next
>>
>> On top of next-20250429 I found that by reverting the following changes
>> that suspend is working again ...
>>
>> Revert "PM: sleep: Resume children after resuming the parent"
>> Revert "PM: sleep: Suspend async parents after suspending children"
>> Revert "PM: sleep: Make suspend of devices more asynchronous"
> 
> I see.
> 
> Do all three commits need to be reverted to make things work again?
> The first one only touches the resume path, so it would be surprising
> if it caused a suspend regression to occur.

I had to revert the other 2 patches for the kernel to build. I tried to 
only revery this patch, and fix the build issue by defining the missing 
function and mutex, but that did not seem to work. The build worked, but 
suspend still failed. I am not sure if that is conclusive though.

> 
> The most likely commit to cause this issue to happen is the second one
> because it effectively changes the suspend ordering for "async"
> devices.
> 
>> I have been looking into this a bit more to see what device is failing
>> and by adding a bit of debug I found that entry to suspend was failing
>> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform
>> when one of the I2C controllers (i2c@c240000) was being suspended.
>>
>> I found that if I disable only this I2C controller in device-tree
>> suspend worked again on top of -next. This I2C controller has 3 devices
>> on the platform; two ina3221 devices and one Cypress Type-C controller.
>> I then found that removing only the two ina3221 devices (in
>> tegra194-p2888.dtsi) also allows suspend to work.
>>
>> At this point, I am still unclear why this is now failing.  If you have
>> any thoughts or things I can try please let me know.
> 
> So are the devices in question "async"?  To check this, please see the
> "async" attribute in the "power" subdirectory of the sysfs device
> directory for each of them.

I checked for both the I2C controller and ina3221 and don't see any 
'async' files ...

$ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status
$ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status
$ ls /sys/class/i2c-dev/i2c-2/power/
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status

> If they are "async", you can write "disable" to this attribute to turn
> them into "sync" devices.  I'd do this and see what happens.
> 
> Overall, it looks like some dependencies aren't properly represented
> by device links on this platform.


Yes that would appear to be the case, but at the moment, I don't see 
what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs 
when suspending I2C controller. Exactly where is still a mystery.

Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 13:21       ` Jon Hunter
@ 2025-05-07 13:39         ` Rafael J. Wysocki
  2025-05-07 14:25           ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-07 13:39 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern,
	Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

Hi Jon,

On Wed, May 7, 2025 at 3:21 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Rafael,
>
> On 02/05/2025 21:33, Rafael J. Wysocki wrote:
>
> ...
>
> >> I have noticed a suspend regression with -next on a couple of our Tegra
> >> boards. Bisect was pointing to the following merge commit ...
> >>
> >> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge
> >> branch 'pm-sleep' into linux-next
> >>
> >> On top of next-20250429 I found that by reverting the following changes
> >> that suspend is working again ...
> >>
> >> Revert "PM: sleep: Resume children after resuming the parent"
> >> Revert "PM: sleep: Suspend async parents after suspending children"
> >> Revert "PM: sleep: Make suspend of devices more asynchronous"
> >
> > I see.
> >
> > Do all three commits need to be reverted to make things work again?
> > The first one only touches the resume path, so it would be surprising
> > if it caused a suspend regression to occur.
>
> I had to revert the other 2 patches for the kernel to build. I tried to
> only revery this patch, and fix the build issue by defining the missing
> function and mutex, but that did not seem to work. The build worked, but
> suspend still failed. I am not sure if that is conclusive though.

The "PM: sleep: Resume children after resuming the parent" patch is
unlikely to introduce the issue, so you should not need to revert it.

The issue is probably introduced by "PM: sleep: Suspend async parents
after suspending children".

> >
> > The most likely commit to cause this issue to happen is the second one
> > because it effectively changes the suspend ordering for "async"
> > devices.
> >
> >> I have been looking into this a bit more to see what device is failing
> >> and by adding a bit of debug I found that entry to suspend was failing
> >> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform
> >> when one of the I2C controllers (i2c@c240000) was being suspended.
> >>
> >> I found that if I disable only this I2C controller in device-tree
> >> suspend worked again on top of -next. This I2C controller has 3 devices
> >> on the platform; two ina3221 devices and one Cypress Type-C controller.
> >> I then found that removing only the two ina3221 devices (in
> >> tegra194-p2888.dtsi) also allows suspend to work.
> >>
> >> At this point, I am still unclear why this is now failing.  If you have
> >> any thoughts or things I can try please let me know.
> >
> > So are the devices in question "async"?  To check this, please see the
> > "async" attribute in the "power" subdirectory of the sysfs device
> > directory for each of them.
>
> I checked for both the I2C controller and ina3221 and don't see any
> 'async' files ...
>
> $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/
> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> control               runtime_status
> $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/
> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> control               runtime_status
> $ ls /sys/class/i2c-dev/i2c-2/power/
> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> control               runtime_status

You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug
attributes).

> > If they are "async", you can write "disable" to this attribute to turn
> > them into "sync" devices.  I'd do this and see what happens.

You may also turn off async suspend altogether:

# echo 0 > /sys/power/pm_async

and see if this helps.

> >
> > Overall, it looks like some dependencies aren't properly represented
> > by device links on this platform.
>
> Yes that would appear to be the case, but at the moment, I don't see
> what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs
> when suspending I2C controller. Exactly where is still a mystery.

So if it works with async suspend disabled, I would check the ordering
in which devices are suspended in that case (probably requires running
with initcall_debug in the kernel command line and echoing 1 to
/sys/power/pm_debug_messages).

Then compare it to the ordering in which devices are suspended with
async suspend enabled (for that, I would hack the I2C controller
suspend return an error early to avoid having to reboot the board
after it hangs).  All of the differences would be suspicious.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 13:39         ` Rafael J. Wysocki
@ 2025-05-07 14:25           ` Rafael J. Wysocki
  2025-05-07 14:39             ` Jon Hunter
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-07 14:25 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

On Wed, May 7, 2025 at 3:39 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Hi Jon,
>
> On Wed, May 7, 2025 at 3:21 PM Jon Hunter <jonathanh@nvidia.com> wrote:
> >
> > Hi Rafael,
> >
> > On 02/05/2025 21:33, Rafael J. Wysocki wrote:
> >
> > ...
> >
> > >> I have noticed a suspend regression with -next on a couple of our Tegra
> > >> boards. Bisect was pointing to the following merge commit ...
> > >>
> > >> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge
> > >> branch 'pm-sleep' into linux-next
> > >>
> > >> On top of next-20250429 I found that by reverting the following changes
> > >> that suspend is working again ...
> > >>
> > >> Revert "PM: sleep: Resume children after resuming the parent"
> > >> Revert "PM: sleep: Suspend async parents after suspending children"
> > >> Revert "PM: sleep: Make suspend of devices more asynchronous"
> > >
> > > I see.
> > >
> > > Do all three commits need to be reverted to make things work again?
> > > The first one only touches the resume path, so it would be surprising
> > > if it caused a suspend regression to occur.
> >
> > I had to revert the other 2 patches for the kernel to build. I tried to
> > only revery this patch, and fix the build issue by defining the missing
> > function and mutex, but that did not seem to work. The build worked, but
> > suspend still failed. I am not sure if that is conclusive though.
>
> The "PM: sleep: Resume children after resuming the parent" patch is
> unlikely to introduce the issue, so you should not need to revert it.
>
> The issue is probably introduced by "PM: sleep: Suspend async parents
> after suspending children".
>
> > >
> > > The most likely commit to cause this issue to happen is the second one
> > > because it effectively changes the suspend ordering for "async"
> > > devices.
> > >
> > >> I have been looking into this a bit more to see what device is failing
> > >> and by adding a bit of debug I found that entry to suspend was failing
> > >> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform
> > >> when one of the I2C controllers (i2c@c240000) was being suspended.
> > >>
> > >> I found that if I disable only this I2C controller in device-tree
> > >> suspend worked again on top of -next. This I2C controller has 3 devices
> > >> on the platform; two ina3221 devices and one Cypress Type-C controller.
> > >> I then found that removing only the two ina3221 devices (in
> > >> tegra194-p2888.dtsi) also allows suspend to work.
> > >>
> > >> At this point, I am still unclear why this is now failing.  If you have
> > >> any thoughts or things I can try please let me know.
> > >
> > > So are the devices in question "async"?  To check this, please see the
> > > "async" attribute in the "power" subdirectory of the sysfs device
> > > directory for each of them.
> >
> > I checked for both the I2C controller and ina3221 and don't see any
> > 'async' files ...
> >
> > $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/
> > autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> > control               runtime_status
> > $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/
> > autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> > control               runtime_status
> > $ ls /sys/class/i2c-dev/i2c-2/power/
> > autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> > control               runtime_status
>
> You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug
> attributes).
>
> > > If they are "async", you can write "disable" to this attribute to turn
> > > them into "sync" devices.  I'd do this and see what happens.
>
> You may also turn off async suspend altogether:
>
> # echo 0 > /sys/power/pm_async
>
> and see if this helps.
>
> > >
> > > Overall, it looks like some dependencies aren't properly represented
> > > by device links on this platform.
> >
> > Yes that would appear to be the case, but at the moment, I don't see
> > what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs
> > when suspending I2C controller. Exactly where is still a mystery.

I checked in the meantime and found that the i2c subsystem enables
async suspend/resume for all devices, clients and controllers, so the
devices in question are "async" AFAICS.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 14:25           ` Rafael J. Wysocki
@ 2025-05-07 14:39             ` Jon Hunter
  2025-05-07 14:56               ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Jon Hunter @ 2025-05-07 14:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org


On 07/05/2025 15:25, Rafael J. Wysocki wrote:

...

>>>> So are the devices in question "async"?  To check this, please see the
>>>> "async" attribute in the "power" subdirectory of the sysfs device
>>>> directory for each of them.
>>>
>>> I checked for both the I2C controller and ina3221 and don't see any
>>> 'async' files ...
>>>
>>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/
>>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
>>> control               runtime_status
>>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/
>>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
>>> control               runtime_status
>>> $ ls /sys/class/i2c-dev/i2c-2/power/
>>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
>>> control               runtime_status
>>
>> You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug
>> attributes).
>>
>>>> If they are "async", you can write "disable" to this attribute to turn
>>>> them into "sync" devices.  I'd do this and see what happens.
>>
>> You may also turn off async suspend altogether:
>>
>> # echo 0 > /sys/power/pm_async
>>
>> and see if this helps.

This does indeed help!

>>>> Overall, it looks like some dependencies aren't properly represented
>>>> by device links on this platform.
>>>
>>> Yes that would appear to be the case, but at the moment, I don't see
>>> what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs
>>> when suspending I2C controller. Exactly where is still a mystery.
> 
> I checked in the meantime and found that the i2c subsystem enables
> async suspend/resume for all devices, clients and controllers, so the
> devices in question are "async" AFAICS.

So that would make sense given that the above works.

When it fails it appears to hang in dpm_wait_for_subordinate() when 
calling dpm_wait_for_children() from what I can tell.

I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C 
itself non-async works.

Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 14:39             ` Jon Hunter
@ 2025-05-07 14:56               ` Rafael J. Wysocki
  2025-05-07 15:39                 ` Jon Hunter
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-07 14:56 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern,
	Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

On Wed, May 7, 2025 at 4:39 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 07/05/2025 15:25, Rafael J. Wysocki wrote:
>
> ...
>
> >>>> So are the devices in question "async"?  To check this, please see the
> >>>> "async" attribute in the "power" subdirectory of the sysfs device
> >>>> directory for each of them.
> >>>
> >>> I checked for both the I2C controller and ina3221 and don't see any
> >>> 'async' files ...
> >>>
> >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/
> >>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> >>> control               runtime_status
> >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/
> >>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> >>> control               runtime_status
> >>> $ ls /sys/class/i2c-dev/i2c-2/power/
> >>> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
> >>> control               runtime_status
> >>
> >> You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug
> >> attributes).
> >>
> >>>> If they are "async", you can write "disable" to this attribute to turn
> >>>> them into "sync" devices.  I'd do this and see what happens.
> >>
> >> You may also turn off async suspend altogether:
> >>
> >> # echo 0 > /sys/power/pm_async
> >>
> >> and see if this helps.
>
> This does indeed help!
>
> >>>> Overall, it looks like some dependencies aren't properly represented
> >>>> by device links on this platform.
> >>>
> >>> Yes that would appear to be the case, but at the moment, I don't see
> >>> what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs
> >>> when suspending I2C controller. Exactly where is still a mystery.
> >
> > I checked in the meantime and found that the i2c subsystem enables
> > async suspend/resume for all devices, clients and controllers, so the
> > devices in question are "async" AFAICS.
>
> So that would make sense given that the above works.
>
> When it fails it appears to hang in dpm_wait_for_subordinate() when
> calling dpm_wait_for_children() from what I can tell.

So apparently one of the children has not been suspended yet when this
happens.  That's fine because it should be suspended at one point and
the parent suspend should be unblocked, so it looks like the child
suspend doesn't complete for some reason.

> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C
> itself non-async works.

What probably happens is that after the "PM: sleep: Suspend async
parents after suspending children" , the i2c clients are suspended
upfront (because they have no children) and when one of them has
suspended, it triggers a parent suspend.  The parent suspend then
waits for the other client to complete suspending, but that cannot
make progress for some reason.

Before that patch, the i2c clients would have suspended only after all
of the "sync" devices following them in dpm_list had been suspended
(the list is processed in the reverse order during suspend), so it
looks like there is a hidden dependency between one of the i2c clients
and a "sync" device.

If the above supposition is right, flagging the i2c client as "sync"
will make the problem go away.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 14:56               ` Rafael J. Wysocki
@ 2025-05-07 15:39                 ` Jon Hunter
  2025-05-07 16:43                   ` Rafael J. Wysocki
  2025-05-10 11:39                   ` Rafael J. Wysocki
  0 siblings, 2 replies; 13+ messages in thread
From: Jon Hunter @ 2025-05-07 15:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org



On 07/05/2025 15:56, Rafael J. Wysocki wrote:

...

> So apparently one of the children has not been suspended yet when this
> happens.  That's fine because it should be suspended at one point and
> the parent suspend should be unblocked, so it looks like the child
> suspend doesn't complete for some reason.
> 
>> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C
>> itself non-async works.
> 
> What probably happens is that after the "PM: sleep: Suspend async
> parents after suspending children" , the i2c clients are suspended
> upfront (because they have no children) and when one of them has
> suspended, it triggers a parent suspend.  The parent suspend then
> waits for the other client to complete suspending, but that cannot
> make progress for some reason.
> 
> Before that patch, the i2c clients would have suspended only after all
> of the "sync" devices following them in dpm_list had been suspended
> (the list is processed in the reverse order during suspend), so it
> looks like there is a hidden dependency between one of the i2c clients
> and a "sync" device.
> 
> If the above supposition is right, flagging the i2c client as "sync"
> will make the problem go away.

So all the I2C controllers are 'sync' devices ...

$ cat /sys/class/i2c-dev/i2c-*/power/async
disabled
disabled
disabled
disabled
disabled
disabled
disabled

The I2C clients on the problematic I2C controller are all 'async' 
devices ...

$ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async
enabled
enabled
enabled

Setting all these to 'disabled' fixes the problem. However, also just 
setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 
devices seem to be fine being async). The 'cypd4226' device is 
interesting, because this one is a USB Type-C controller and there is a 
circular dependency between the Type-C and USB PHY (see 
arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts).

If I make the following change then this does fix it ...

diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c 
b/drivers/usb/typec/ucsi/ucsi_ccg.c
index f01e4ef6619d..e9a9df1431af 100644
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c
+++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
@@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)

         i2c_set_clientdata(client, uc);

+       device_disable_async_suspend(uc->dev);
+
         pm_runtime_set_active(uc->dev);
         pm_runtime_enable(uc->dev);
         pm_runtime_use_autosuspend(uc->dev);

Is this the right fix for this?

Jon

-- 
nvpublic


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 15:39                 ` Jon Hunter
@ 2025-05-07 16:43                   ` Rafael J. Wysocki
  2025-05-08 13:38                     ` Jon Hunter
  2025-05-10 11:39                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-07 16:43 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern,
	Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

On Wed, May 7, 2025 at 5:40 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
>
> On 07/05/2025 15:56, Rafael J. Wysocki wrote:
>
> ...
>
> > So apparently one of the children has not been suspended yet when this
> > happens.  That's fine because it should be suspended at one point and
> > the parent suspend should be unblocked, so it looks like the child
> > suspend doesn't complete for some reason.
> >
> >> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C
> >> itself non-async works.
> >
> > What probably happens is that after the "PM: sleep: Suspend async
> > parents after suspending children" , the i2c clients are suspended
> > upfront (because they have no children) and when one of them has
> > suspended, it triggers a parent suspend.  The parent suspend then
> > waits for the other client to complete suspending, but that cannot
> > make progress for some reason.
> >
> > Before that patch, the i2c clients would have suspended only after all
> > of the "sync" devices following them in dpm_list had been suspended
> > (the list is processed in the reverse order during suspend), so it
> > looks like there is a hidden dependency between one of the i2c clients
> > and a "sync" device.
> >
> > If the above supposition is right, flagging the i2c client as "sync"
> > will make the problem go away.
>
> So all the I2C controllers are 'sync' devices ...
>
> $ cat /sys/class/i2c-dev/i2c-*/power/async
> disabled
> disabled
> disabled
> disabled
> disabled
> disabled
> disabled
>
> The I2C clients on the problematic I2C controller are all 'async'
> devices ...
>
> $ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async
> enabled
> enabled
> enabled
>
> Setting all these to 'disabled' fixes the problem. However, also just
> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221
> devices seem to be fine being async). The 'cypd4226' device is
> interesting, because this one is a USB Type-C controller and there is a
> circular dependency between the Type-C and USB PHY (see
> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts).

Circular dependencies are problematic for suspend/resume in general.
I wonder if fw_devlink can resolve this?

> If I make the following change then this does fix it ...
>
> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c
> b/drivers/usb/typec/ucsi/ucsi_ccg.c
> index f01e4ef6619d..e9a9df1431af 100644
> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c
> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)
>
>          i2c_set_clientdata(client, uc);
>
> +       device_disable_async_suspend(uc->dev);
> +
>          pm_runtime_set_active(uc->dev);
>          pm_runtime_enable(uc->dev);
>          pm_runtime_use_autosuspend(uc->dev);
>
> Is this the right fix for this?

At least as a stop-gap, yes.

In order to enable async suspend for a device, one needs to at least
assume with sufficiently high confidence that it will be safe to
reorder it with respect to any other device in the system except for
the devices having known dependencies on the device in question.
Those known dependencies either are parent-child connections or they
need to be represented by device links.

In this particular case, it is painfully clear that the suspend of the
device in question cannot be reordered with respect to at least one
other device where the dependency is not known in the above sense.

Thus the device in question should not be allowed to suspend asynchronously.

Would it be better to represent the dependency in question via a
device link?  Yes, it would, but until that happens, disabling async
suspend is the right thing to do IMV.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 16:43                   ` Rafael J. Wysocki
@ 2025-05-08 13:38                     ` Jon Hunter
  2025-05-08 18:06                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Jon Hunter @ 2025-05-08 13:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org


On 07/05/2025 17:43, Rafael J. Wysocki wrote:

...

>> Setting all these to 'disabled' fixes the problem. However, also just
>> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221
>> devices seem to be fine being async). The 'cypd4226' device is
>> interesting, because this one is a USB Type-C controller and there is a
>> circular dependency between the Type-C and USB PHY (see
>> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts).
> 
> Circular dependencies are problematic for suspend/resume in general.
> I wonder if fw_devlink can resolve this?

I booted with fw_devlink=on, but this did not resolve the problem :-(

>> If I make the following change then this does fix it ...
>>
>> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c
>> b/drivers/usb/typec/ucsi/ucsi_ccg.c
>> index f01e4ef6619d..e9a9df1431af 100644
>> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c
>> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
>> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)
>>
>>           i2c_set_clientdata(client, uc);
>>
>> +       device_disable_async_suspend(uc->dev);
>> +
>>           pm_runtime_set_active(uc->dev);
>>           pm_runtime_enable(uc->dev);
>>           pm_runtime_use_autosuspend(uc->dev);
>>
>> Is this the right fix for this?
> 
> At least as a stop-gap, yes.
> 
> In order to enable async suspend for a device, one needs to at least
> assume with sufficiently high confidence that it will be safe to
> reorder it with respect to any other device in the system except for
> the devices having known dependencies on the device in question.
> Those known dependencies either are parent-child connections or they
> need to be represented by device links.
> 
> In this particular case, it is painfully clear that the suspend of the
> device in question cannot be reordered with respect to at least one
> other device where the dependency is not known in the above sense.
> 
> Thus the device in question should not be allowed to suspend asynchronously.
> 
> Would it be better to represent the dependency in question via a
> device link?  Yes, it would, but until that happens, disabling async
> suspend is the right thing to do IMV.

OK so it is not clear to me if the PM core should be handling this for 
us. Is that what you are implying/suggesting?

Thanks
Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-08 13:38                     ` Jon Hunter
@ 2025-05-08 18:06                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-08 18:06 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern,
	Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

On Thu, May 8, 2025 at 3:38 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 07/05/2025 17:43, Rafael J. Wysocki wrote:
>
> ...
>
> >> Setting all these to 'disabled' fixes the problem. However, also just
> >> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221
> >> devices seem to be fine being async). The 'cypd4226' device is
> >> interesting, because this one is a USB Type-C controller and there is a
> >> circular dependency between the Type-C and USB PHY (see
> >> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts).
> >
> > Circular dependencies are problematic for suspend/resume in general.
> > I wonder if fw_devlink can resolve this?
>
> I booted with fw_devlink=on, but this did not resolve the problem :-(

I see, but thanks for checking.

> >> If I make the following change then this does fix it ...
> >>
> >> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c
> >> b/drivers/usb/typec/ucsi/ucsi_ccg.c
> >> index f01e4ef6619d..e9a9df1431af 100644
> >> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c
> >> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
> >> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)
> >>
> >>           i2c_set_clientdata(client, uc);
> >>
> >> +       device_disable_async_suspend(uc->dev);
> >> +
> >>           pm_runtime_set_active(uc->dev);
> >>           pm_runtime_enable(uc->dev);
> >>           pm_runtime_use_autosuspend(uc->dev);
> >>
> >> Is this the right fix for this?
> >
> > At least as a stop-gap, yes.
> >
> > In order to enable async suspend for a device, one needs to at least
> > assume with sufficiently high confidence that it will be safe to
> > reorder it with respect to any other device in the system except for
> > the devices having known dependencies on the device in question.
> > Those known dependencies either are parent-child connections or they
> > need to be represented by device links.
> >
> > In this particular case, it is painfully clear that the suspend of the
> > device in question cannot be reordered with respect to at least one
> > other device where the dependency is not known in the above sense.
> >
> > Thus the device in question should not be allowed to suspend asynchronously.
> >
> > Would it be better to represent the dependency in question via a
> > device link?  Yes, it would, but until that happens, disabling async
> > suspend is the right thing to do IMV.
>
> OK so it is not clear to me if the PM core should be handling this for
> us.

No, it can't because dependency information is missing.

> Is that what you are implying/suggesting?

No.

Async suspend needs to be disabled for this device for now I'm afraid
and I don't think that it will make suspend/resume take much more
time.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-07 15:39                 ` Jon Hunter
  2025-05-07 16:43                   ` Rafael J. Wysocki
@ 2025-05-10 11:39                   ` Rafael J. Wysocki
  2025-05-10 11:50                     ` Jon Hunter
  1 sibling, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2025-05-10 11:39 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern,
	Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org

On Wed, May 7, 2025 at 5:40 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
>
> On 07/05/2025 15:56, Rafael J. Wysocki wrote:
>
> ...
>
> > So apparently one of the children has not been suspended yet when this
> > happens.  That's fine because it should be suspended at one point and
> > the parent suspend should be unblocked, so it looks like the child
> > suspend doesn't complete for some reason.
> >
> >> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C
> >> itself non-async works.
> >
> > What probably happens is that after the "PM: sleep: Suspend async
> > parents after suspending children" , the i2c clients are suspended
> > upfront (because they have no children) and when one of them has
> > suspended, it triggers a parent suspend.  The parent suspend then
> > waits for the other client to complete suspending, but that cannot
> > make progress for some reason.
> >
> > Before that patch, the i2c clients would have suspended only after all
> > of the "sync" devices following them in dpm_list had been suspended
> > (the list is processed in the reverse order during suspend), so it
> > looks like there is a hidden dependency between one of the i2c clients
> > and a "sync" device.
> >
> > If the above supposition is right, flagging the i2c client as "sync"
> > will make the problem go away.
>
> So all the I2C controllers are 'sync' devices ...
>
> $ cat /sys/class/i2c-dev/i2c-*/power/async
> disabled
> disabled
> disabled
> disabled
> disabled
> disabled
> disabled
>
> The I2C clients on the problematic I2C controller are all 'async'
> devices ...
>
> $ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async
> enabled
> enabled
> enabled
>
> Setting all these to 'disabled' fixes the problem. However, also just
> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221
> devices seem to be fine being async). The 'cypd4226' device is
> interesting, because this one is a USB Type-C controller and there is a
> circular dependency between the Type-C and USB PHY (see
> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts).
>
> If I make the following change then this does fix it ...
>
> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c
> b/drivers/usb/typec/ucsi/ucsi_ccg.c
> index f01e4ef6619d..e9a9df1431af 100644
> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c
> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)
>
>          i2c_set_clientdata(client, uc);
>
> +       device_disable_async_suspend(uc->dev);
> +
>          pm_runtime_set_active(uc->dev);
>          pm_runtime_enable(uc->dev);
>          pm_runtime_use_autosuspend(uc->dev);
>

I'm going to pick up this patch for 6.16 and add a changelog to it, so
please consider providing a Signed-off-by: tag for this change.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
  2025-05-10 11:39                   ` Rafael J. Wysocki
@ 2025-05-10 11:50                     ` Jon Hunter
  0 siblings, 0 replies; 13+ messages in thread
From: Jon Hunter @ 2025-05-10 11:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson,
	Johan Hovold, Manivannan Sadhasivam, Saravana Kannan,
	linux-tegra@vger.kernel.org


On 10/05/2025 12:39, Rafael J. Wysocki wrote:

...

>> If I make the following change then this does fix it ...
>>
>> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c
>> b/drivers/usb/typec/ucsi/ucsi_ccg.c
>> index f01e4ef6619d..e9a9df1431af 100644
>> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c
>> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c
>> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client)
>>
>>           i2c_set_clientdata(client, uc);
>>
>> +       device_disable_async_suspend(uc->dev);
>> +
>>           pm_runtime_set_active(uc->dev);
>>           pm_runtime_enable(uc->dev);
>>           pm_runtime_use_autosuspend(uc->dev);
>>
> 
> I'm going to pick up this patch for 6.16 and add a changelog to it, so
> please consider providing a Signed-off-by: tag for this change.

OK thanks! Yes, please add my ...

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>

Jon

-- 
nvpublic


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-05-10 11:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <10629535.nUPlyArG6x@rjwysocki.net>
     [not found] ` <22630663.EfDdHjke4D@rjwysocki.net>
2025-05-01  9:51   ` [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent Jon Hunter
2025-05-02 20:33     ` Rafael J. Wysocki
2025-05-07 13:21       ` Jon Hunter
2025-05-07 13:39         ` Rafael J. Wysocki
2025-05-07 14:25           ` Rafael J. Wysocki
2025-05-07 14:39             ` Jon Hunter
2025-05-07 14:56               ` Rafael J. Wysocki
2025-05-07 15:39                 ` Jon Hunter
2025-05-07 16:43                   ` Rafael J. Wysocki
2025-05-08 13:38                     ` Jon Hunter
2025-05-08 18:06                       ` Rafael J. Wysocki
2025-05-10 11:39                   ` Rafael J. Wysocki
2025-05-10 11:50                     ` Jon Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox