* Dead lock with clock global prepare_lock mutex and device's power.runtime_status
@ 2025-07-01 3:16 Carlos Song
2025-07-07 10:58 ` Peng Fan
0 siblings, 1 reply; 5+ messages in thread
From: Carlos Song @ 2025-07-01 3:16 UTC (permalink / raw)
To: mturquette@baylibre.com, sboyd@kernel.org, rafael@kernel.org,
pavel@kernel.org, len.brown@intel.com, Greg Kroah-Hartman,
dakr@kernel.org, Aisheng Dong, Andi Shyti, shawnguo@kernel.org,
s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
Frank Li
Cc: linux-clk@vger.kernel.org, linux-pm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-i2c@vger.kernel.org,
imx@lists.linux.dev, linux-kernel@vger.kernel.org, Bough Chen,
Jun Li
Hi, All:
We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
The dead lock happen as below call stacks
Task 117 Task 120
schedule()
clk_prepare_lock()--> wait prepare_lock(mutex_lock) schedule() wait for power.runtime_status exit RPM_SUSPENDING
^^^^ A ^^^^ B
clk_bulk_unprepare() rpm_resume()
lpi2c_runtime_suspend() pm_runtime_resume_and_get()
... lpi2c_imx_xfer()
...
rpm_suspend() set RPM_SUSPENDING pcf857x_set();
^^^^ B ...
clk_prepare_lock() --> hold prepare_lock
^^^^ A
...
Task 117 set power.runtime_status to RPM_SUSPENDING (A) and wait for task 120 release clock's global prepare mutex (B).
Task 120 hold global prepare mutex (B) and wait for power.runtime_status finish suspend (A).
The root cause is that the scope of global prepare_lock is too big. gpio-gate-clock and lpi2c clock are totally independent.
Although it may not happen at downstream case because [1], there are still have other i2c bus and spi bus, and other bus drivers. clock unprepare is quite common in runtime suspend functions.
[1] upstream driver have not use clk_unprepare in suspend functions.
The full log as below:
INFO: task kworker/2:3:117(T117) is blocked on a mutex likely owned by task kworker/u16:5:120(T120).
[ 6.955479][ T73] imx-lpi2c 42530000.i2c: lpi2c_runtime_suspend2
[ 6.957437][ T120] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == cpu 0x60100000
[ 6.957437][ T120] ; no fixup was ever needed for this devicetree
[ 6.964257][ T118] platform regulatory.0: Falling back to sysfs fallback for: regulatory.db
[ 6.973579][ T120] imx-lpi2c 42530000.i2c: lpi2c_runtime_resume1
[ 7.027143][ T120] imx-lpi2c 42530000.i2c: lpi2c_runtime_resume2 0
[ 7.033984][ T120] -----------pcf857x_set in
[ 7.038373][ T120] -----------------pcf857x_output in
[ 7.043527][ T120] ----------------- gpio->write in
[ 7.048520][ T117] imx-lpi2c 42530000.i2c: lpi2c_runtime_suspend
[ 7.054774][ T120] i2c i2c-2: msg[0] w0/r1 0, data[0] is 7f
[ 7.060448][ T120] i2c i2c-2: 42530000.i2c: pm_runtime_resume_and_get
[ 67.030316][ T118] cfg80211: failed to load regulatory.db
[ 244.059129][ T40] INFO: task kworker/2:3:117 blocked for more than 121 seconds.
[ 244.066619][ T40] Not tainted 6.15.0-rc2-next-20250417-06621-g7cd761409c73-dirty #7
[ 244.075010][ T40] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 244.083572][ T40] task:kworker/2:3 state:D stack:0 pid:117 tgid:117 ppid:2 task_flags:0x4208060 flags:0x00000008
[ 244.095438][ T40] Workqueue: pm pm_runtime_work
[ 244.100157][ T40] Call trace:
[ 244.103302][ T40] __switch_to+0xf8/0x1a0 (T)
[ 244.107882][ T40] __schedule+0x418/0xfd8
[ 244.112080][ T40] schedule+0x4c/0x164
[ 244.116055][ T40] schedule_preempt_disabled+0x24/0x40
[ 244.121392][ T40] __mutex_lock+0x1d4/0x580
[ 244.125798][ T40] mutex_lock_nested+0x24/0x30
[ 244.130436][ T40] clk_prepare_lock+0x4c/0xa8
[ 244.135018][ T40] clk_unprepare+0x24/0x44
[ 244.139298][ T40] clk_bulk_unprepare+0x38/0x60
[ 244.144048][ T40] lpi2c_runtime_suspend+0x64/0x9c
[ 244.149021][ T40] pm_generic_runtime_suspend+0x2c/0x44
[ 244.154475][ T40] __rpm_callback+0x48/0x1ec
[ 244.158935][ T40] rpm_callback+0x74/0x80
[ 244.163167][ T40] rpm_suspend+0x104/0x668
[ 244.167446][ T40] pm_runtime_work+0xc8/0xcc
[ 244.171939][ T40] process_one_work+0x214/0x62c
[ 244.176650][ T40] worker_thread+0x1ac/0x34c
[ 244.181144][ T40] kthread+0x144/0x220
[ 244.185082][ T40] ret_from_fork+0x10/0x20
[ 244.189435][ T40] INFO: task kworker/2:3:117 is blocked on a mutex likely owned by task kworker/u16:5:120.
[ 244.199300][ T40] task:kworker/u16:5 state:D stack:0 pid:120 tgid:120 ppid:2 task_flags:0x4208060 flags:0x00000008
[ 244.211164][ T40] Workqueue: async async_run_entry_fn
[ 244.216404][ T40] Call trace:
[ 244.219587][ T40] __switch_to+0xf8/0x1a0 (T)
[ 244.224127][ T40] __schedule+0x418/0xfd8
[ 244.228358][ T40] schedule+0x4c/0x164
[ 244.232298][ T40] rpm_resume+0x1c8/0x734
[ 244.236531][ T40] __pm_runtime_resume+0x50/0x98
[ 244.241338][ T40] lpi2c_imx_xfer+0x58/0xe60
[ 244.245829][ T40] __i2c_transfer+0x1c4/0x828
[ 244.250377][ T40] i2c_smbus_xfer_emulated+0x1b8/0x708
[ 244.255735][ T40] __i2c_smbus_xfer+0x1a0/0x6f0
[ 244.260447][ T40] i2c_smbus_xfer+0x98/0x120
[ 244.264939][ T40] i2c_smbus_write_byte+0x2c/0x3c
[ 244.269825][ T40] i2c_write_le8+0x10/0x20
[ 244.274152][ T40] pcf857x_output+0x7c/0xc0
[ 244.278527][ T40] pcf857x_set+0x3c/0x5c
[ 244.282672][ T40] gpiochip_set+0x68/0xc0
[ 244.286864][ T40] gpiod_set_raw_value_commit+0xd4/0x1a0
[ 244.292404][ T40] gpiod_set_value_nocheck+0x34/0x60
[ 244.297549][ T40] gpiod_set_value_cansleep+0x24/0x60
[ 244.302821][ T40] clk_sleeping_gpio_gate_prepare+0x18/0x28
[ 244.308582][ T40] clk_core_prepare+0xbc/0x2a8
[ 244.313247][ T40] clk_prepare+0x28/0x44
[ 244.317361][ T40] clk_bulk_prepare+0x34/0xa0
[ 244.321940][ T40] imx_pcie_host_init+0xe0/0x434
[ 244.326747][ T40] dw_pcie_host_init+0x1b8/0x758
[ 244.331587][ T40] imx_pcie_probe+0x380/0x8e0
[ 244.336133][ T40] platform_probe+0x68/0xd8
[ 244.340539][ T40] really_probe+0xbc/0x2bc
[ 244.344817][ T40] __driver_probe_device+0x78/0x120
[ 244.349916][ T40] driver_probe_device+0x3c/0x160
[ 244.354801][ T40] __device_attach_driver+0xb8/0x140
[ 244.359986][ T40] bus_for_each_drv+0x88/0xe8
[ 244.364534][ T40] __device_attach_async_helper+0xb8/0xdc
[ 244.370161][ T40] async_run_entry_fn+0x34/0xe0
[ 244.374882][ T40] process_one_work+0x214/0x62c
[ 244.379634][ T40] worker_thread+0x1ac/0x34c
[ 244.384102][ T40] kthread+0x144/0x220
[ 244.388076][ T40] ret_from_fork+0x10/0x20
Best Regard
Carlos Song
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
2025-07-07 10:58 ` Peng Fan
@ 2025-07-07 10:36 ` Marc Kleine-Budde
2025-07-07 17:28 ` Chen-Yu Tsai
0 siblings, 1 reply; 5+ messages in thread
From: Marc Kleine-Budde @ 2025-07-07 10:36 UTC (permalink / raw)
To: Peng Fan
Cc: Carlos Song, Ulf Hansson, Stephen Boyd, imx@lists.linux.dev,
rafael@kernel.org, mturquette@baylibre.com, Frank Li,
linux-i2c@vger.kernel.org, dakr@kernel.org, festevam@gmail.com,
linux-clk@vger.kernel.org, pavel@kernel.org, Bough Chen,
len.brown@intel.com, Andi Shyti, linux-pm@vger.kernel.org,
s.hauer@pengutronix.de, linux-arm-kernel@lists.infradead.org,
Aisheng Dong, Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
kernel@pengutronix.de, shawnguo@kernel.org, Jun Li
[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]
On 07.07.2025 18:58:16, Peng Fan wrote:
> On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
> >Hi, All:
> >
> >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
> >
> >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
> >
> >The dead lock happen as below call stacks
> >
> >Task 117 Task 120
> >
> >schedule()
> >clk_prepare_lock()--> wait prepare_lock(mutex_lock) schedule() wait for power.runtime_status exit RPM_SUSPENDING
> > ^^^^ A ^^^^ B
> >clk_bulk_unprepare() rpm_resume()
> >lpi2c_runtime_suspend() pm_runtime_resume_and_get()
> >... lpi2c_imx_xfer()
> > ...
> >rpm_suspend() set RPM_SUSPENDING pcf857x_set();
> > ^^^^ B ...
> > clk_prepare_lock() --> hold prepare_lock
> > ^^^^ A
> > ...
> >
>
> This is a common issue that clk use a big prepare lock which is easy
> to trigger dead lock with runtime pm. I recalled that pengutronix raised
> this, but could not find the information.
Alexander Stein stumbled over this issue some time ago:
| https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/
I encountered it too, while trying to add a clock provider driver for a
SPI attached CAN controller which uses runtime pm.
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
2025-07-01 3:16 Dead lock with clock global prepare_lock mutex and device's power.runtime_status Carlos Song
@ 2025-07-07 10:58 ` Peng Fan
2025-07-07 10:36 ` Marc Kleine-Budde
0 siblings, 1 reply; 5+ messages in thread
From: Peng Fan @ 2025-07-07 10:58 UTC (permalink / raw)
To: Carlos Song, Ulf Hansson, Stephen Boyd
Cc: mturquette@baylibre.com, sboyd@kernel.org, rafael@kernel.org,
pavel@kernel.org, len.brown@intel.com, Greg Kroah-Hartman,
dakr@kernel.org, Aisheng Dong, Andi Shyti, shawnguo@kernel.org,
s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
Frank Li, linux-clk@vger.kernel.org, linux-pm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-i2c@vger.kernel.org,
imx@lists.linux.dev, linux-kernel@vger.kernel.org, Bough Chen,
Jun Li
+Ulf
On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
>Hi, All:
>
>We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
>
>We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
>
>The dead lock happen as below call stacks
>
>Task 117 Task 120
>
>schedule()
>clk_prepare_lock()--> wait prepare_lock(mutex_lock) schedule() wait for power.runtime_status exit RPM_SUSPENDING
> ^^^^ A ^^^^ B
>clk_bulk_unprepare() rpm_resume()
>lpi2c_runtime_suspend() pm_runtime_resume_and_get()
>... lpi2c_imx_xfer()
> ...
>rpm_suspend() set RPM_SUSPENDING pcf857x_set();
> ^^^^ B ...
> clk_prepare_lock() --> hold prepare_lock
> ^^^^ A
> ...
>
This is a common issue that clk use a big prepare lock which is easy
to trigger dead lock with runtime pm. I recalled that pengutronix raised
this, but could not find the information.
In this case, there are two clock providers that are independent.
So I think using one global prepare lock does not make sense here.
Stephen,
I propose using a per provider prepare lock if the providers are
totally independent. How do you think?
Thanks,
Peng
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
2025-07-07 10:36 ` Marc Kleine-Budde
@ 2025-07-07 17:28 ` Chen-Yu Tsai
2025-07-22 9:14 ` Miquel Raynal
0 siblings, 1 reply; 5+ messages in thread
From: Chen-Yu Tsai @ 2025-07-07 17:28 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: Miquel Raynal, Peng Fan, Carlos Song, Ulf Hansson, Stephen Boyd,
imx@lists.linux.dev, rafael@kernel.org, mturquette@baylibre.com,
Frank Li, linux-i2c@vger.kernel.org, dakr@kernel.org,
festevam@gmail.com, linux-clk@vger.kernel.org, pavel@kernel.org,
Bough Chen, len.brown@intel.com, Andi Shyti,
linux-pm@vger.kernel.org, s.hauer@pengutronix.de,
linux-arm-kernel@lists.infradead.org, Aisheng Dong,
Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
kernel@pengutronix.de, shawnguo@kernel.org, Jun Li
Hi,
On Mon, Jul 7, 2025 at 7:05 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> On 07.07.2025 18:58:16, Peng Fan wrote:
> > On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
> > >Hi, All:
> > >
> > >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
> > >
> > >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
> > >
> > >The dead lock happen as below call stacks
> > >
> > >Task 117 Task 120
> > >
> > >schedule()
> > >clk_prepare_lock()--> wait prepare_lock(mutex_lock) schedule() wait for power.runtime_status exit RPM_SUSPENDING
> > > ^^^^ A ^^^^ B
> > >clk_bulk_unprepare() rpm_resume()
> > >lpi2c_runtime_suspend() pm_runtime_resume_and_get()
> > >... lpi2c_imx_xfer()
> > > ...
> > >rpm_suspend() set RPM_SUSPENDING pcf857x_set();
> > > ^^^^ B ...
> > > clk_prepare_lock() --> hold prepare_lock
> > > ^^^^ A
> > > ...
> > >
> >
> > This is a common issue that clk use a big prepare lock which is easy
> > to trigger dead lock with runtime pm. I recalled that pengutronix raised
> > this, but could not find the information.
>
> Alexander Stein stumbled over this issue some time ago:
>
> | https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/
>
> I encountered it too, while trying to add a clock provider driver for a
> SPI attached CAN controller which uses runtime pm.
Miquel from Bootlin posted a more formal description of the problem and
some possible solutions last year [1].
[1] https://lore.kernel.org/all/20240527181928.4fc6b5f0@xps-13/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
2025-07-07 17:28 ` Chen-Yu Tsai
@ 2025-07-22 9:14 ` Miquel Raynal
0 siblings, 0 replies; 5+ messages in thread
From: Miquel Raynal @ 2025-07-22 9:14 UTC (permalink / raw)
To: Chen-Yu Tsai
Cc: Marc Kleine-Budde, Peng Fan, Carlos Song, Ulf Hansson,
Stephen Boyd, imx@lists.linux.dev, rafael@kernel.org,
mturquette@baylibre.com, Frank Li, linux-i2c@vger.kernel.org,
dakr@kernel.org, festevam@gmail.com, linux-clk@vger.kernel.org,
pavel@kernel.org, Bough Chen, len.brown@intel.com, Andi Shyti,
linux-pm@vger.kernel.org, s.hauer@pengutronix.de,
linux-arm-kernel@lists.infradead.org, Aisheng Dong,
Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
kernel@pengutronix.de, shawnguo@kernel.org, Jun Li,
Thomas Petazzoni
Hello,
Thanks Chen-Yu for the heads up!
On 08/07/2025 at 01:28:08 +08, Chen-Yu Tsai <wens@kernel.org> wrote:
> Hi,
>
> On Mon, Jul 7, 2025 at 7:05 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>
>> On 07.07.2025 18:58:16, Peng Fan wrote:
>> > On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
>> > >Hi, All:
>> > >
>> > >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
>> > >
>> > >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
>> > >
>> > >The dead lock happen as below call stacks
>> > >
>> > >Task 117 Task 120
>> > >
>> > >schedule()
>> > >clk_prepare_lock()--> wait prepare_lock(mutex_lock) schedule() wait for power.runtime_status exit RPM_SUSPENDING
>> > > ^^^^ A ^^^^ B
>> > >clk_bulk_unprepare() rpm_resume()
>> > >lpi2c_runtime_suspend() pm_runtime_resume_and_get()
>> > >... lpi2c_imx_xfer()
>> > > ...
>> > >rpm_suspend() set RPM_SUSPENDING pcf857x_set();
>> > > ^^^^ B ...
>> > > clk_prepare_lock() --> hold prepare_lock
>> > > ^^^^ A
>> > > ...
>> > >
>> >
>> > This is a common issue that clk use a big prepare lock which is easy
>> > to trigger dead lock with runtime pm. I recalled that pengutronix raised
>> > this, but could not find the information.
>>
>> Alexander Stein stumbled over this issue some time ago:
>>
>> | https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/
>>
>> I encountered it too, while trying to add a clock provider driver for a
>> SPI attached CAN controller which uses runtime pm.
>
> Miquel from Bootlin posted a more formal description of the problem and
> some possible solutions last year [1].
>
> [1] https://lore.kernel.org/all/20240527181928.4fc6b5f0@xps-13/
I also sent an RFC in April:
https://lore.kernel.org/all/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/
I haven't got the energy yet to process the interesting feedback from
Rafael and Stephen. But getting a broader audience and maybe more
feedback will certainly help!
Thanks,
Miquèl
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-22 9:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 3:16 Dead lock with clock global prepare_lock mutex and device's power.runtime_status Carlos Song
2025-07-07 10:58 ` Peng Fan
2025-07-07 10:36 ` Marc Kleine-Budde
2025-07-07 17:28 ` Chen-Yu Tsai
2025-07-22 9:14 ` Miquel Raynal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).