* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent [not found] ` <22630663.EfDdHjke4D@rjwysocki.net> @ 2025-05-01 9:51 ` Jon Hunter 2025-05-02 20:33 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Jon Hunter @ 2025-05-01 9:51 UTC (permalink / raw) To: Rafael J. Wysocki, Linux PM Cc: LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org Hi Rafael, On 14/03/2025 12:50, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > According to [1], the handling of device suspend and resume, and > particularly the latter, involves unnecessary overhead related to > starting new async work items for devices that cannot make progress > right away because they have to wait for other devices. > > To reduce this problem in the resume path, use the observation that > starting the async resume of the children of a device after resuming > the parent is likely to produce less scheduling and memory management > noise than starting it upfront while at the same time it should not > increase the resume duration substantially. > > Accordingly, modify the code to start the async resume of the device's > children when the processing of the parent has been completed in each > stage of device resume and only start async resume upfront for devices > without parents. > > Also make it check if a given device can be resumed asynchronously > before starting the synchronous resume of it in case it will have to > wait for another that is already resuming asynchronously. > > In addition to making the async resume of devices more friendly to > systems with relatively less computing resources, this change is also > preliminary for analogous changes in the suspend path. > > On the systems where it has been tested, this change by itself does > not affect the overall system resume duration in a measurable way. > > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1] > Suggested-by: Saravana Kannan <saravanak@google.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> I have noticed a suspend regression with -next on a couple of our Tegra boards. Bisect was pointing to the following merge commit ... # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge branch 'pm-sleep' into linux-next On top of next-20250429 I found that by reverting the following changes that suspend is working again ... Revert "PM: sleep: Resume children after resuming the parent" Revert "PM: sleep: Suspend async parents after suspending children" Revert "PM: sleep: Make suspend of devices more asynchronous" I have been looking into this a bit more to see what device is failing and by adding a bit of debug I found that entry to suspend was failing on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform when one of the I2C controllers (i2c@c240000) was being suspended. I found that if I disable only this I2C controller in device-tree suspend worked again on top of -next. This I2C controller has 3 devices on the platform; two ina3221 devices and one Cypress Type-C controller. I then found that removing only the two ina3221 devices (in tegra194-p2888.dtsi) also allows suspend to work. At this point, I am still unclear why this is now failing. If you have any thoughts or things I can try please let me know. Thanks! Jon -- nvpublic ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-01 9:51 ` [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent Jon Hunter @ 2025-05-02 20:33 ` Rafael J. Wysocki 2025-05-07 13:21 ` Jon Hunter 0 siblings, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-02 20:33 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org Hi Jon, On Thu, May 1, 2025 at 11:51 AM Jon Hunter <jonathanh@nvidia.com> wrote: > > Hi Rafael, > > On 14/03/2025 12:50, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > According to [1], the handling of device suspend and resume, and > > particularly the latter, involves unnecessary overhead related to > > starting new async work items for devices that cannot make progress > > right away because they have to wait for other devices. > > > > To reduce this problem in the resume path, use the observation that > > starting the async resume of the children of a device after resuming > > the parent is likely to produce less scheduling and memory management > > noise than starting it upfront while at the same time it should not > > increase the resume duration substantially. > > > > Accordingly, modify the code to start the async resume of the device's > > children when the processing of the parent has been completed in each > > stage of device resume and only start async resume upfront for devices > > without parents. > > > > Also make it check if a given device can be resumed asynchronously > > before starting the synchronous resume of it in case it will have to > > wait for another that is already resuming asynchronously. > > > > In addition to making the async resume of devices more friendly to > > systems with relatively less computing resources, this change is also > > preliminary for analogous changes in the suspend path. > > > > On the systems where it has been tested, this change by itself does > > not affect the overall system resume duration in a measurable way. > > > > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1] > > Suggested-by: Saravana Kannan <saravanak@google.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > I have noticed a suspend regression with -next on a couple of our Tegra > boards. Bisect was pointing to the following merge commit ... > > # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge > branch 'pm-sleep' into linux-next > > On top of next-20250429 I found that by reverting the following changes > that suspend is working again ... > > Revert "PM: sleep: Resume children after resuming the parent" > Revert "PM: sleep: Suspend async parents after suspending children" > Revert "PM: sleep: Make suspend of devices more asynchronous" I see. Do all three commits need to be reverted to make things work again? The first one only touches the resume path, so it would be surprising if it caused a suspend regression to occur. The most likely commit to cause this issue to happen is the second one because it effectively changes the suspend ordering for "async" devices. > I have been looking into this a bit more to see what device is failing > and by adding a bit of debug I found that entry to suspend was failing > on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform > when one of the I2C controllers (i2c@c240000) was being suspended. > > I found that if I disable only this I2C controller in device-tree > suspend worked again on top of -next. This I2C controller has 3 devices > on the platform; two ina3221 devices and one Cypress Type-C controller. > I then found that removing only the two ina3221 devices (in > tegra194-p2888.dtsi) also allows suspend to work. > > At this point, I am still unclear why this is now failing. If you have > any thoughts or things I can try please let me know. So are the devices in question "async"? To check this, please see the "async" attribute in the "power" subdirectory of the sysfs device directory for each of them. If they are "async", you can write "disable" to this attribute to turn them into "sync" devices. I'd do this and see what happens. Overall, it looks like some dependencies aren't properly represented by device links on this platform. Thanks, Rafael ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-02 20:33 ` Rafael J. Wysocki @ 2025-05-07 13:21 ` Jon Hunter 2025-05-07 13:39 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Jon Hunter @ 2025-05-07 13:21 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org Hi Rafael, On 02/05/2025 21:33, Rafael J. Wysocki wrote: ... >> I have noticed a suspend regression with -next on a couple of our Tegra >> boards. Bisect was pointing to the following merge commit ... >> >> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge >> branch 'pm-sleep' into linux-next >> >> On top of next-20250429 I found that by reverting the following changes >> that suspend is working again ... >> >> Revert "PM: sleep: Resume children after resuming the parent" >> Revert "PM: sleep: Suspend async parents after suspending children" >> Revert "PM: sleep: Make suspend of devices more asynchronous" > > I see. > > Do all three commits need to be reverted to make things work again? > The first one only touches the resume path, so it would be surprising > if it caused a suspend regression to occur. I had to revert the other 2 patches for the kernel to build. I tried to only revery this patch, and fix the build issue by defining the missing function and mutex, but that did not seem to work. The build worked, but suspend still failed. I am not sure if that is conclusive though. > > The most likely commit to cause this issue to happen is the second one > because it effectively changes the suspend ordering for "async" > devices. > >> I have been looking into this a bit more to see what device is failing >> and by adding a bit of debug I found that entry to suspend was failing >> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform >> when one of the I2C controllers (i2c@c240000) was being suspended. >> >> I found that if I disable only this I2C controller in device-tree >> suspend worked again on top of -next. This I2C controller has 3 devices >> on the platform; two ina3221 devices and one Cypress Type-C controller. >> I then found that removing only the two ina3221 devices (in >> tegra194-p2888.dtsi) also allows suspend to work. >> >> At this point, I am still unclear why this is now failing. If you have >> any thoughts or things I can try please let me know. > > So are the devices in question "async"? To check this, please see the > "async" attribute in the "power" subdirectory of the sysfs device > directory for each of them. I checked for both the I2C controller and ina3221 and don't see any 'async' files ... $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/ autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/ autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status $ ls /sys/class/i2c-dev/i2c-2/power/ autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status > If they are "async", you can write "disable" to this attribute to turn > them into "sync" devices. I'd do this and see what happens. > > Overall, it looks like some dependencies aren't properly represented > by device links on this platform. Yes that would appear to be the case, but at the moment, I don't see what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs when suspending I2C controller. Exactly where is still a mystery. Jon -- nvpublic ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 13:21 ` Jon Hunter @ 2025-05-07 13:39 ` Rafael J. Wysocki 2025-05-07 14:25 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-07 13:39 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org Hi Jon, On Wed, May 7, 2025 at 3:21 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > Hi Rafael, > > On 02/05/2025 21:33, Rafael J. Wysocki wrote: > > ... > > >> I have noticed a suspend regression with -next on a couple of our Tegra > >> boards. Bisect was pointing to the following merge commit ... > >> > >> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge > >> branch 'pm-sleep' into linux-next > >> > >> On top of next-20250429 I found that by reverting the following changes > >> that suspend is working again ... > >> > >> Revert "PM: sleep: Resume children after resuming the parent" > >> Revert "PM: sleep: Suspend async parents after suspending children" > >> Revert "PM: sleep: Make suspend of devices more asynchronous" > > > > I see. > > > > Do all three commits need to be reverted to make things work again? > > The first one only touches the resume path, so it would be surprising > > if it caused a suspend regression to occur. > > I had to revert the other 2 patches for the kernel to build. I tried to > only revery this patch, and fix the build issue by defining the missing > function and mutex, but that did not seem to work. The build worked, but > suspend still failed. I am not sure if that is conclusive though. The "PM: sleep: Resume children after resuming the parent" patch is unlikely to introduce the issue, so you should not need to revert it. The issue is probably introduced by "PM: sleep: Suspend async parents after suspending children". > > > > The most likely commit to cause this issue to happen is the second one > > because it effectively changes the suspend ordering for "async" > > devices. > > > >> I have been looking into this a bit more to see what device is failing > >> and by adding a bit of debug I found that entry to suspend was failing > >> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform > >> when one of the I2C controllers (i2c@c240000) was being suspended. > >> > >> I found that if I disable only this I2C controller in device-tree > >> suspend worked again on top of -next. This I2C controller has 3 devices > >> on the platform; two ina3221 devices and one Cypress Type-C controller. > >> I then found that removing only the two ina3221 devices (in > >> tegra194-p2888.dtsi) also allows suspend to work. > >> > >> At this point, I am still unclear why this is now failing. If you have > >> any thoughts or things I can try please let me know. > > > > So are the devices in question "async"? To check this, please see the > > "async" attribute in the "power" subdirectory of the sysfs device > > directory for each of them. > > I checked for both the I2C controller and ina3221 and don't see any > 'async' files ... > > $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/ > autosuspend_delay_ms runtime_active_time runtime_suspended_time > control runtime_status > $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/ > autosuspend_delay_ms runtime_active_time runtime_suspended_time > control runtime_status > $ ls /sys/class/i2c-dev/i2c-2/power/ > autosuspend_delay_ms runtime_active_time runtime_suspended_time > control runtime_status You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug attributes). > > If they are "async", you can write "disable" to this attribute to turn > > them into "sync" devices. I'd do this and see what happens. You may also turn off async suspend altogether: # echo 0 > /sys/power/pm_async and see if this helps. > > > > Overall, it looks like some dependencies aren't properly represented > > by device links on this platform. > > Yes that would appear to be the case, but at the moment, I don't see > what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs > when suspending I2C controller. Exactly where is still a mystery. So if it works with async suspend disabled, I would check the ordering in which devices are suspended in that case (probably requires running with initcall_debug in the kernel command line and echoing 1 to /sys/power/pm_debug_messages). Then compare it to the ordering in which devices are suspended with async suspend enabled (for that, I would hack the I2C controller suspend return an error early to avoid having to reboot the board after it hangs). All of the differences would be suspicious. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 13:39 ` Rafael J. Wysocki @ 2025-05-07 14:25 ` Rafael J. Wysocki 2025-05-07 14:39 ` Jon Hunter 0 siblings, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-07 14:25 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On Wed, May 7, 2025 at 3:39 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > Hi Jon, > > On Wed, May 7, 2025 at 3:21 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > > > Hi Rafael, > > > > On 02/05/2025 21:33, Rafael J. Wysocki wrote: > > > > ... > > > > >> I have noticed a suspend regression with -next on a couple of our Tegra > > >> boards. Bisect was pointing to the following merge commit ... > > >> > > >> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge > > >> branch 'pm-sleep' into linux-next > > >> > > >> On top of next-20250429 I found that by reverting the following changes > > >> that suspend is working again ... > > >> > > >> Revert "PM: sleep: Resume children after resuming the parent" > > >> Revert "PM: sleep: Suspend async parents after suspending children" > > >> Revert "PM: sleep: Make suspend of devices more asynchronous" > > > > > > I see. > > > > > > Do all three commits need to be reverted to make things work again? > > > The first one only touches the resume path, so it would be surprising > > > if it caused a suspend regression to occur. > > > > I had to revert the other 2 patches for the kernel to build. I tried to > > only revery this patch, and fix the build issue by defining the missing > > function and mutex, but that did not seem to work. The build worked, but > > suspend still failed. I am not sure if that is conclusive though. > > The "PM: sleep: Resume children after resuming the parent" patch is > unlikely to introduce the issue, so you should not need to revert it. > > The issue is probably introduced by "PM: sleep: Suspend async parents > after suspending children". > > > > > > > The most likely commit to cause this issue to happen is the second one > > > because it effectively changes the suspend ordering for "async" > > > devices. > > > > > >> I have been looking into this a bit more to see what device is failing > > >> and by adding a bit of debug I found that entry to suspend was failing > > >> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform > > >> when one of the I2C controllers (i2c@c240000) was being suspended. > > >> > > >> I found that if I disable only this I2C controller in device-tree > > >> suspend worked again on top of -next. This I2C controller has 3 devices > > >> on the platform; two ina3221 devices and one Cypress Type-C controller. > > >> I then found that removing only the two ina3221 devices (in > > >> tegra194-p2888.dtsi) also allows suspend to work. > > >> > > >> At this point, I am still unclear why this is now failing. If you have > > >> any thoughts or things I can try please let me know. > > > > > > So are the devices in question "async"? To check this, please see the > > > "async" attribute in the "power" subdirectory of the sysfs device > > > directory for each of them. > > > > I checked for both the I2C controller and ina3221 and don't see any > > 'async' files ... > > > > $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/ > > autosuspend_delay_ms runtime_active_time runtime_suspended_time > > control runtime_status > > $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/ > > autosuspend_delay_ms runtime_active_time runtime_suspended_time > > control runtime_status > > $ ls /sys/class/i2c-dev/i2c-2/power/ > > autosuspend_delay_ms runtime_active_time runtime_suspended_time > > control runtime_status > > You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug > attributes). > > > > If they are "async", you can write "disable" to this attribute to turn > > > them into "sync" devices. I'd do this and see what happens. > > You may also turn off async suspend altogether: > > # echo 0 > /sys/power/pm_async > > and see if this helps. > > > > > > > Overall, it looks like some dependencies aren't properly represented > > > by device links on this platform. > > > > Yes that would appear to be the case, but at the moment, I don't see > > what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs > > when suspending I2C controller. Exactly where is still a mystery. I checked in the meantime and found that the i2c subsystem enables async suspend/resume for all devices, clients and controllers, so the devices in question are "async" AFAICS. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 14:25 ` Rafael J. Wysocki @ 2025-05-07 14:39 ` Jon Hunter 2025-05-07 14:56 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Jon Hunter @ 2025-05-07 14:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On 07/05/2025 15:25, Rafael J. Wysocki wrote: ... >>>> So are the devices in question "async"? To check this, please see the >>>> "async" attribute in the "power" subdirectory of the sysfs device >>>> directory for each of them. >>> >>> I checked for both the I2C controller and ina3221 and don't see any >>> 'async' files ... >>> >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/ >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time >>> control runtime_status >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/ >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time >>> control runtime_status >>> $ ls /sys/class/i2c-dev/i2c-2/power/ >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time >>> control runtime_status >> >> You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug >> attributes). >> >>>> If they are "async", you can write "disable" to this attribute to turn >>>> them into "sync" devices. I'd do this and see what happens. >> >> You may also turn off async suspend altogether: >> >> # echo 0 > /sys/power/pm_async >> >> and see if this helps. This does indeed help! >>>> Overall, it looks like some dependencies aren't properly represented >>>> by device links on this platform. >>> >>> Yes that would appear to be the case, but at the moment, I don't see >>> what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs >>> when suspending I2C controller. Exactly where is still a mystery. > > I checked in the meantime and found that the i2c subsystem enables > async suspend/resume for all devices, clients and controllers, so the > devices in question are "async" AFAICS. So that would make sense given that the above works. When it fails it appears to hang in dpm_wait_for_subordinate() when calling dpm_wait_for_children() from what I can tell. I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C itself non-async works. Jon -- nvpublic ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 14:39 ` Jon Hunter @ 2025-05-07 14:56 ` Rafael J. Wysocki 2025-05-07 15:39 ` Jon Hunter 0 siblings, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-07 14:56 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On Wed, May 7, 2025 at 4:39 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > > On 07/05/2025 15:25, Rafael J. Wysocki wrote: > > ... > > >>>> So are the devices in question "async"? To check this, please see the > >>>> "async" attribute in the "power" subdirectory of the sysfs device > >>>> directory for each of them. > >>> > >>> I checked for both the I2C controller and ina3221 and don't see any > >>> 'async' files ... > >>> > >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0040/power/ > >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time > >>> control runtime_status > >>> $ ls /sys/class/i2c-dev/i2c-2/device/2-0041/power/ > >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time > >>> control runtime_status > >>> $ ls /sys/class/i2c-dev/i2c-2/power/ > >>> autosuspend_delay_ms runtime_active_time runtime_suspended_time > >>> control runtime_status > >> > >> You need to set CONFIG_PM_ADVANCED_DEBUG to see those (and other debug > >> attributes). > >> > >>>> If they are "async", you can write "disable" to this attribute to turn > >>>> them into "sync" devices. I'd do this and see what happens. > >> > >> You may also turn off async suspend altogether: > >> > >> # echo 0 > /sys/power/pm_async > >> > >> and see if this helps. > > This does indeed help! > > >>>> Overall, it looks like some dependencies aren't properly represented > >>>> by device links on this platform. > >>> > >>> Yes that would appear to be the case, but at the moment, I don't see > >>> what it is. The ina3221 devices appear to suspend fine AFAICT, but hangs > >>> when suspending I2C controller. Exactly where is still a mystery. > > > > I checked in the meantime and found that the i2c subsystem enables > > async suspend/resume for all devices, clients and controllers, so the > > devices in question are "async" AFAICS. > > So that would make sense given that the above works. > > When it fails it appears to hang in dpm_wait_for_subordinate() when > calling dpm_wait_for_children() from what I can tell. So apparently one of the children has not been suspended yet when this happens. That's fine because it should be suspended at one point and the parent suspend should be unblocked, so it looks like the child suspend doesn't complete for some reason. > I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C > itself non-async works. What probably happens is that after the "PM: sleep: Suspend async parents after suspending children" , the i2c clients are suspended upfront (because they have no children) and when one of them has suspended, it triggers a parent suspend. The parent suspend then waits for the other client to complete suspending, but that cannot make progress for some reason. Before that patch, the i2c clients would have suspended only after all of the "sync" devices following them in dpm_list had been suspended (the list is processed in the reverse order during suspend), so it looks like there is a hidden dependency between one of the i2c clients and a "sync" device. If the above supposition is right, flagging the i2c client as "sync" will make the problem go away. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 14:56 ` Rafael J. Wysocki @ 2025-05-07 15:39 ` Jon Hunter 2025-05-07 16:43 ` Rafael J. Wysocki 2025-05-10 11:39 ` Rafael J. Wysocki 0 siblings, 2 replies; 13+ messages in thread From: Jon Hunter @ 2025-05-07 15:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On 07/05/2025 15:56, Rafael J. Wysocki wrote: ... > So apparently one of the children has not been suspended yet when this > happens. That's fine because it should be suspended at one point and > the parent suspend should be unblocked, so it looks like the child > suspend doesn't complete for some reason. > >> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C >> itself non-async works. > > What probably happens is that after the "PM: sleep: Suspend async > parents after suspending children" , the i2c clients are suspended > upfront (because they have no children) and when one of them has > suspended, it triggers a parent suspend. The parent suspend then > waits for the other client to complete suspending, but that cannot > make progress for some reason. > > Before that patch, the i2c clients would have suspended only after all > of the "sync" devices following them in dpm_list had been suspended > (the list is processed in the reverse order during suspend), so it > looks like there is a hidden dependency between one of the i2c clients > and a "sync" device. > > If the above supposition is right, flagging the i2c client as "sync" > will make the problem go away. So all the I2C controllers are 'sync' devices ... $ cat /sys/class/i2c-dev/i2c-*/power/async disabled disabled disabled disabled disabled disabled disabled The I2C clients on the problematic I2C controller are all 'async' devices ... $ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async enabled enabled enabled Setting all these to 'disabled' fixes the problem. However, also just setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 devices seem to be fine being async). The 'cypd4226' device is interesting, because this one is a USB Type-C controller and there is a circular dependency between the Type-C and USB PHY (see arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts). If I make the following change then this does fix it ... diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c index f01e4ef6619d..e9a9df1431af 100644 --- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) i2c_set_clientdata(client, uc); + device_disable_async_suspend(uc->dev); + pm_runtime_set_active(uc->dev); pm_runtime_enable(uc->dev); pm_runtime_use_autosuspend(uc->dev); Is this the right fix for this? Jon -- nvpublic ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 15:39 ` Jon Hunter @ 2025-05-07 16:43 ` Rafael J. Wysocki 2025-05-08 13:38 ` Jon Hunter 2025-05-10 11:39 ` Rafael J. Wysocki 1 sibling, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-07 16:43 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On Wed, May 7, 2025 at 5:40 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > > > On 07/05/2025 15:56, Rafael J. Wysocki wrote: > > ... > > > So apparently one of the children has not been suspended yet when this > > happens. That's fine because it should be suspended at one point and > > the parent suspend should be unblocked, so it looks like the child > > suspend doesn't complete for some reason. > > > >> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C > >> itself non-async works. > > > > What probably happens is that after the "PM: sleep: Suspend async > > parents after suspending children" , the i2c clients are suspended > > upfront (because they have no children) and when one of them has > > suspended, it triggers a parent suspend. The parent suspend then > > waits for the other client to complete suspending, but that cannot > > make progress for some reason. > > > > Before that patch, the i2c clients would have suspended only after all > > of the "sync" devices following them in dpm_list had been suspended > > (the list is processed in the reverse order during suspend), so it > > looks like there is a hidden dependency between one of the i2c clients > > and a "sync" device. > > > > If the above supposition is right, flagging the i2c client as "sync" > > will make the problem go away. > > So all the I2C controllers are 'sync' devices ... > > $ cat /sys/class/i2c-dev/i2c-*/power/async > disabled > disabled > disabled > disabled > disabled > disabled > disabled > > The I2C clients on the problematic I2C controller are all 'async' > devices ... > > $ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async > enabled > enabled > enabled > > Setting all these to 'disabled' fixes the problem. However, also just > setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 > devices seem to be fine being async). The 'cypd4226' device is > interesting, because this one is a USB Type-C controller and there is a > circular dependency between the Type-C and USB PHY (see > arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts). Circular dependencies are problematic for suspend/resume in general. I wonder if fw_devlink can resolve this? > If I make the following change then this does fix it ... > > diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c > b/drivers/usb/typec/ucsi/ucsi_ccg.c > index f01e4ef6619d..e9a9df1431af 100644 > --- a/drivers/usb/typec/ucsi/ucsi_ccg.c > +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c > @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) > > i2c_set_clientdata(client, uc); > > + device_disable_async_suspend(uc->dev); > + > pm_runtime_set_active(uc->dev); > pm_runtime_enable(uc->dev); > pm_runtime_use_autosuspend(uc->dev); > > Is this the right fix for this? At least as a stop-gap, yes. In order to enable async suspend for a device, one needs to at least assume with sufficiently high confidence that it will be safe to reorder it with respect to any other device in the system except for the devices having known dependencies on the device in question. Those known dependencies either are parent-child connections or they need to be represented by device links. In this particular case, it is painfully clear that the suspend of the device in question cannot be reordered with respect to at least one other device where the dependency is not known in the above sense. Thus the device in question should not be allowed to suspend asynchronously. Would it be better to represent the dependency in question via a device link? Yes, it would, but until that happens, disabling async suspend is the right thing to do IMV. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 16:43 ` Rafael J. Wysocki @ 2025-05-08 13:38 ` Jon Hunter 2025-05-08 18:06 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Jon Hunter @ 2025-05-08 13:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On 07/05/2025 17:43, Rafael J. Wysocki wrote: ... >> Setting all these to 'disabled' fixes the problem. However, also just >> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 >> devices seem to be fine being async). The 'cypd4226' device is >> interesting, because this one is a USB Type-C controller and there is a >> circular dependency between the Type-C and USB PHY (see >> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts). > > Circular dependencies are problematic for suspend/resume in general. > I wonder if fw_devlink can resolve this? I booted with fw_devlink=on, but this did not resolve the problem :-( >> If I make the following change then this does fix it ... >> >> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c >> b/drivers/usb/typec/ucsi/ucsi_ccg.c >> index f01e4ef6619d..e9a9df1431af 100644 >> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c >> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c >> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) >> >> i2c_set_clientdata(client, uc); >> >> + device_disable_async_suspend(uc->dev); >> + >> pm_runtime_set_active(uc->dev); >> pm_runtime_enable(uc->dev); >> pm_runtime_use_autosuspend(uc->dev); >> >> Is this the right fix for this? > > At least as a stop-gap, yes. > > In order to enable async suspend for a device, one needs to at least > assume with sufficiently high confidence that it will be safe to > reorder it with respect to any other device in the system except for > the devices having known dependencies on the device in question. > Those known dependencies either are parent-child connections or they > need to be represented by device links. > > In this particular case, it is painfully clear that the suspend of the > device in question cannot be reordered with respect to at least one > other device where the dependency is not known in the above sense. > > Thus the device in question should not be allowed to suspend asynchronously. > > Would it be better to represent the dependency in question via a > device link? Yes, it would, but until that happens, disabling async > suspend is the right thing to do IMV. OK so it is not clear to me if the PM core should be handling this for us. Is that what you are implying/suggesting? Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-08 13:38 ` Jon Hunter @ 2025-05-08 18:06 ` Rafael J. Wysocki 0 siblings, 0 replies; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-08 18:06 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On Thu, May 8, 2025 at 3:38 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > > On 07/05/2025 17:43, Rafael J. Wysocki wrote: > > ... > > >> Setting all these to 'disabled' fixes the problem. However, also just > >> setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 > >> devices seem to be fine being async). The 'cypd4226' device is > >> interesting, because this one is a USB Type-C controller and there is a > >> circular dependency between the Type-C and USB PHY (see > >> arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts). > > > > Circular dependencies are problematic for suspend/resume in general. > > I wonder if fw_devlink can resolve this? > > I booted with fw_devlink=on, but this did not resolve the problem :-( I see, but thanks for checking. > >> If I make the following change then this does fix it ... > >> > >> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c > >> b/drivers/usb/typec/ucsi/ucsi_ccg.c > >> index f01e4ef6619d..e9a9df1431af 100644 > >> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c > >> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c > >> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) > >> > >> i2c_set_clientdata(client, uc); > >> > >> + device_disable_async_suspend(uc->dev); > >> + > >> pm_runtime_set_active(uc->dev); > >> pm_runtime_enable(uc->dev); > >> pm_runtime_use_autosuspend(uc->dev); > >> > >> Is this the right fix for this? > > > > At least as a stop-gap, yes. > > > > In order to enable async suspend for a device, one needs to at least > > assume with sufficiently high confidence that it will be safe to > > reorder it with respect to any other device in the system except for > > the devices having known dependencies on the device in question. > > Those known dependencies either are parent-child connections or they > > need to be represented by device links. > > > > In this particular case, it is painfully clear that the suspend of the > > device in question cannot be reordered with respect to at least one > > other device where the dependency is not known in the above sense. > > > > Thus the device in question should not be allowed to suspend asynchronously. > > > > Would it be better to represent the dependency in question via a > > device link? Yes, it would, but until that happens, disabling async > > suspend is the right thing to do IMV. > > OK so it is not clear to me if the PM core should be handling this for > us. No, it can't because dependency information is missing. > Is that what you are implying/suggesting? No. Async suspend needs to be disabled for this device for now I'm afraid and I don't think that it will make suspend/resume take much more time. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-07 15:39 ` Jon Hunter 2025-05-07 16:43 ` Rafael J. Wysocki @ 2025-05-10 11:39 ` Rafael J. Wysocki 2025-05-10 11:50 ` Jon Hunter 1 sibling, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2025-05-10 11:39 UTC (permalink / raw) To: Jon Hunter Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On Wed, May 7, 2025 at 5:40 PM Jon Hunter <jonathanh@nvidia.com> wrote: > > > > On 07/05/2025 15:56, Rafael J. Wysocki wrote: > > ... > > > So apparently one of the children has not been suspended yet when this > > happens. That's fine because it should be suspended at one point and > > the parent suspend should be unblocked, so it looks like the child > > suspend doesn't complete for some reason. > > > >> I will enable the PM_ADVANCED_DEBUG and confirm that making the I2C > >> itself non-async works. > > > > What probably happens is that after the "PM: sleep: Suspend async > > parents after suspending children" , the i2c clients are suspended > > upfront (because they have no children) and when one of them has > > suspended, it triggers a parent suspend. The parent suspend then > > waits for the other client to complete suspending, but that cannot > > make progress for some reason. > > > > Before that patch, the i2c clients would have suspended only after all > > of the "sync" devices following them in dpm_list had been suspended > > (the list is processed in the reverse order during suspend), so it > > looks like there is a hidden dependency between one of the i2c clients > > and a "sync" device. > > > > If the above supposition is right, flagging the i2c client as "sync" > > will make the problem go away. > > So all the I2C controllers are 'sync' devices ... > > $ cat /sys/class/i2c-dev/i2c-*/power/async > disabled > disabled > disabled > disabled > disabled > disabled > disabled > > The I2C clients on the problematic I2C controller are all 'async' > devices ... > > $ cat /sys/class/i2c-dev/i2c-2/device/2-*/power/async > enabled > enabled > enabled > > Setting all these to 'disabled' fixes the problem. However, also just > setting the 'cypd4226' device to 'sync' fixes the problem (the ina3221 > devices seem to be fine being async). The 'cypd4226' device is > interesting, because this one is a USB Type-C controller and there is a > circular dependency between the Type-C and USB PHY (see > arch/arm64/boot/dts/nvidia/tegra194-p2972-0000.dts). > > If I make the following change then this does fix it ... > > diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c > b/drivers/usb/typec/ucsi/ucsi_ccg.c > index f01e4ef6619d..e9a9df1431af 100644 > --- a/drivers/usb/typec/ucsi/ucsi_ccg.c > +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c > @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) > > i2c_set_clientdata(client, uc); > > + device_disable_async_suspend(uc->dev); > + > pm_runtime_set_active(uc->dev); > pm_runtime_enable(uc->dev); > pm_runtime_use_autosuspend(uc->dev); > I'm going to pick up this patch for 6.16 and add a changelog to it, so please consider providing a Signed-off-by: tag for this change. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent 2025-05-10 11:39 ` Rafael J. Wysocki @ 2025-05-10 11:50 ` Jon Hunter 0 siblings, 0 replies; 13+ messages in thread From: Jon Hunter @ 2025-05-10 11:50 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, Linux PM, LKML, Alan Stern, Ulf Hansson, Johan Hovold, Manivannan Sadhasivam, Saravana Kannan, linux-tegra@vger.kernel.org On 10/05/2025 12:39, Rafael J. Wysocki wrote: ... >> If I make the following change then this does fix it ... >> >> diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c >> b/drivers/usb/typec/ucsi/ucsi_ccg.c >> index f01e4ef6619d..e9a9df1431af 100644 >> --- a/drivers/usb/typec/ucsi/ucsi_ccg.c >> +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c >> @@ -1483,6 +1483,8 @@ static int ucsi_ccg_probe(struct i2c_client *client) >> >> i2c_set_clientdata(client, uc); >> >> + device_disable_async_suspend(uc->dev); >> + >> pm_runtime_set_active(uc->dev); >> pm_runtime_enable(uc->dev); >> pm_runtime_use_autosuspend(uc->dev); >> > > I'm going to pick up this patch for 6.16 and add a changelog to it, so > please consider providing a Signed-off-by: tag for this change. OK thanks! Yes, please add my ... Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Jon -- nvpublic ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-05-10 11:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <10629535.nUPlyArG6x@rjwysocki.net>
[not found] ` <22630663.EfDdHjke4D@rjwysocki.net>
2025-05-01 9:51 ` [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent Jon Hunter
2025-05-02 20:33 ` Rafael J. Wysocki
2025-05-07 13:21 ` Jon Hunter
2025-05-07 13:39 ` Rafael J. Wysocki
2025-05-07 14:25 ` Rafael J. Wysocki
2025-05-07 14:39 ` Jon Hunter
2025-05-07 14:56 ` Rafael J. Wysocki
2025-05-07 15:39 ` Jon Hunter
2025-05-07 16:43 ` Rafael J. Wysocki
2025-05-08 13:38 ` Jon Hunter
2025-05-08 18:06 ` Rafael J. Wysocki
2025-05-10 11:39 ` Rafael J. Wysocki
2025-05-10 11:50 ` Jon Hunter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox