Dead lock with clock global prepare_lock mutex and device's power.runtime

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Dead lock with clock global prepare_lock mutex and device's power.runtime_status
@ 2025-07-01  3:16 Carlos Song
  2025-07-07 10:58 ` Peng Fan
  0 siblings, 1 reply; 5+ messages in thread
From: Carlos Song @ 2025-07-01  3:16 UTC (permalink / raw)
  To: mturquette@baylibre.com, sboyd@kernel.org, rafael@kernel.org,
	pavel@kernel.org, len.brown@intel.com, Greg Kroah-Hartman,
	dakr@kernel.org, Aisheng Dong, Andi Shyti, shawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	Frank Li
  Cc: linux-clk@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-i2c@vger.kernel.org,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org, Bough Chen,
	Jun Li

Hi, All:

We met the dead lock issue recently and think it should be common issue and not sure how to fix it.

We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.

The dead lock happen as below call stacks

Task 117                                                Task 120

schedule()
clk_prepare_lock()--> wait prepare_lock(mutex_lock)     schedule() wait for power.runtime_status exit RPM_SUSPENDING
                           ^^^^ A                       ^^^^ B
clk_bulk_unprepare()                                    rpm_resume()
lpi2c_runtime_suspend()                                 pm_runtime_resume_and_get()
...                                                     lpi2c_imx_xfer()
                                                        ...
rpm_suspend() set RPM_SUSPENDING                        pcf857x_set();
                           ^^^^ B                       ...
                                                        clk_prepare_lock() --> hold prepare_lock
                                                        ^^^^ A
                                                        ...


Task 117 set power.runtime_status to RPM_SUSPENDING (A) and wait for task 120 release clock's global prepare mutex (B).

Task 120 hold global prepare mutex (B) and wait for power.runtime_status finish suspend (A).

The root cause is that the scope of global prepare_lock is too big. gpio-gate-clock and lpi2c clock are totally independent.

Although it may not happen at downstream case because [1], there are still have other i2c bus and spi bus, and other bus drivers. clock unprepare is quite common in runtime suspend functions.

[1] upstream driver have not use clk_unprepare in suspend functions.

The full log as below:

INFO: task kworker/2:3:117(T117) is blocked on a mutex likely owned by task kworker/u16:5:120(T120).

[    6.955479][   T73] imx-lpi2c 42530000.i2c: lpi2c_runtime_suspend2
[    6.957437][  T120] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == cpu 0x60100000
[    6.957437][  T120] ; no fixup was ever needed for this devicetree
[    6.964257][  T118] platform regulatory.0: Falling back to sysfs fallback for: regulatory.db
[    6.973579][  T120] imx-lpi2c 42530000.i2c: lpi2c_runtime_resume1
[    7.027143][  T120] imx-lpi2c 42530000.i2c: lpi2c_runtime_resume2 0
[    7.033984][  T120] -----------pcf857x_set in
[    7.038373][  T120] -----------------pcf857x_output in
[    7.043527][  T120] ----------------- gpio->write in
[    7.048520][  T117] imx-lpi2c 42530000.i2c: lpi2c_runtime_suspend
[    7.054774][  T120] i2c i2c-2: msg[0] w0/r1 0, data[0] is 7f
[    7.060448][  T120] i2c i2c-2: 42530000.i2c: pm_runtime_resume_and_get
[   67.030316][  T118] cfg80211: failed to load regulatory.db
[  244.059129][   T40] INFO: task kworker/2:3:117 blocked for more than 121 seconds.
[  244.066619][   T40]       Not tainted 6.15.0-rc2-next-20250417-06621-g7cd761409c73-dirty #7
[  244.075010][   T40] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  244.083572][   T40] task:kworker/2:3     state:D stack:0     pid:117   tgid:117   ppid:2      task_flags:0x4208060 flags:0x00000008
[  244.095438][   T40] Workqueue: pm pm_runtime_work
[  244.100157][   T40] Call trace:
[  244.103302][   T40]  __switch_to+0xf8/0x1a0 (T)
[  244.107882][   T40]  __schedule+0x418/0xfd8
[  244.112080][   T40]  schedule+0x4c/0x164
[  244.116055][   T40]  schedule_preempt_disabled+0x24/0x40
[  244.121392][   T40]  __mutex_lock+0x1d4/0x580
[  244.125798][   T40]  mutex_lock_nested+0x24/0x30
[  244.130436][   T40]  clk_prepare_lock+0x4c/0xa8
[  244.135018][   T40]  clk_unprepare+0x24/0x44
[  244.139298][   T40]  clk_bulk_unprepare+0x38/0x60
[  244.144048][   T40]  lpi2c_runtime_suspend+0x64/0x9c
[  244.149021][   T40]  pm_generic_runtime_suspend+0x2c/0x44
[  244.154475][   T40]  __rpm_callback+0x48/0x1ec
[  244.158935][   T40]  rpm_callback+0x74/0x80
[  244.163167][   T40]  rpm_suspend+0x104/0x668
[  244.167446][   T40]  pm_runtime_work+0xc8/0xcc
[  244.171939][   T40]  process_one_work+0x214/0x62c
[  244.176650][   T40]  worker_thread+0x1ac/0x34c
[  244.181144][   T40]  kthread+0x144/0x220
[  244.185082][   T40]  ret_from_fork+0x10/0x20
[  244.189435][   T40] INFO: task kworker/2:3:117 is blocked on a mutex likely owned by task kworker/u16:5:120.
[  244.199300][   T40] task:kworker/u16:5   state:D stack:0     pid:120   tgid:120   ppid:2      task_flags:0x4208060 flags:0x00000008
[  244.211164][   T40] Workqueue: async async_run_entry_fn
[  244.216404][   T40] Call trace:
[  244.219587][   T40]  __switch_to+0xf8/0x1a0 (T)
[  244.224127][   T40]  __schedule+0x418/0xfd8
[  244.228358][   T40]  schedule+0x4c/0x164
[  244.232298][   T40]  rpm_resume+0x1c8/0x734
[  244.236531][   T40]  __pm_runtime_resume+0x50/0x98
[  244.241338][   T40]  lpi2c_imx_xfer+0x58/0xe60
[  244.245829][   T40]  __i2c_transfer+0x1c4/0x828
[  244.250377][   T40]  i2c_smbus_xfer_emulated+0x1b8/0x708
[  244.255735][   T40]  __i2c_smbus_xfer+0x1a0/0x6f0
[  244.260447][   T40]  i2c_smbus_xfer+0x98/0x120
[  244.264939][   T40]  i2c_smbus_write_byte+0x2c/0x3c
[  244.269825][   T40]  i2c_write_le8+0x10/0x20
[  244.274152][   T40]  pcf857x_output+0x7c/0xc0
[  244.278527][   T40]  pcf857x_set+0x3c/0x5c
[  244.282672][   T40]  gpiochip_set+0x68/0xc0
[  244.286864][   T40]  gpiod_set_raw_value_commit+0xd4/0x1a0
[  244.292404][   T40]  gpiod_set_value_nocheck+0x34/0x60
[  244.297549][   T40]  gpiod_set_value_cansleep+0x24/0x60
[  244.302821][   T40]  clk_sleeping_gpio_gate_prepare+0x18/0x28
[  244.308582][   T40]  clk_core_prepare+0xbc/0x2a8
[  244.313247][   T40]  clk_prepare+0x28/0x44
[  244.317361][   T40]  clk_bulk_prepare+0x34/0xa0
[  244.321940][   T40]  imx_pcie_host_init+0xe0/0x434
[  244.326747][   T40]  dw_pcie_host_init+0x1b8/0x758
[  244.331587][   T40]  imx_pcie_probe+0x380/0x8e0
[  244.336133][   T40]  platform_probe+0x68/0xd8
[  244.340539][   T40]  really_probe+0xbc/0x2bc
[  244.344817][   T40]  __driver_probe_device+0x78/0x120
[  244.349916][   T40]  driver_probe_device+0x3c/0x160
[  244.354801][   T40]  __device_attach_driver+0xb8/0x140
[  244.359986][   T40]  bus_for_each_drv+0x88/0xe8
[  244.364534][   T40]  __device_attach_async_helper+0xb8/0xdc
[  244.370161][   T40]  async_run_entry_fn+0x34/0xe0
[  244.374882][   T40]  process_one_work+0x214/0x62c
[  244.379634][   T40]  worker_thread+0x1ac/0x34c
[  244.384102][   T40]  kthread+0x144/0x220
[  244.388076][   T40]  ret_from_fork+0x10/0x20

Best Regard
Carlos Song


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
  2025-07-07 10:58 ` Peng Fan
@ 2025-07-07 10:36   ` Marc Kleine-Budde
  2025-07-07 17:28     ` Chen-Yu Tsai
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Kleine-Budde @ 2025-07-07 10:36 UTC (permalink / raw)
  To: Peng Fan
  Cc: Carlos Song, Ulf Hansson, Stephen Boyd, imx@lists.linux.dev,
	rafael@kernel.org, mturquette@baylibre.com, Frank Li,
	linux-i2c@vger.kernel.org, dakr@kernel.org, festevam@gmail.com,
	linux-clk@vger.kernel.org, pavel@kernel.org, Bough Chen,
	len.brown@intel.com, Andi Shyti, linux-pm@vger.kernel.org,
	s.hauer@pengutronix.de, linux-arm-kernel@lists.infradead.org,
	Aisheng Dong, Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	kernel@pengutronix.de, shawnguo@kernel.org, Jun Li

[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]

On 07.07.2025 18:58:16, Peng Fan wrote:
> On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
> >Hi, All:
> >
> >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
> >
> >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
> >
> >The dead lock happen as below call stacks
> >
> >Task 117                                                Task 120
> >
> >schedule()
> >clk_prepare_lock()--> wait prepare_lock(mutex_lock)     schedule() wait for power.runtime_status exit RPM_SUSPENDING
> >                           ^^^^ A                       ^^^^ B
> >clk_bulk_unprepare()                                    rpm_resume()
> >lpi2c_runtime_suspend()                                 pm_runtime_resume_and_get()
> >...                                                     lpi2c_imx_xfer()
> >                                                        ...
> >rpm_suspend() set RPM_SUSPENDING                        pcf857x_set();
> >                           ^^^^ B                       ...
> >                                                        clk_prepare_lock() --> hold prepare_lock
> >                                                        ^^^^ A
> >                                                        ...
> >
> 
> This is a common issue that clk use a big prepare lock which is easy
> to trigger dead lock with runtime pm. I recalled that pengutronix raised
> this, but could not find the information.

Alexander Stein stumbled over this issue some time ago:

| https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/

I encountered it too, while trying to add a clock provider driver for a
SPI attached CAN controller which uses runtime pm.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
  2025-07-01  3:16 Dead lock with clock global prepare_lock mutex and device's power.runtime_status Carlos Song
@ 2025-07-07 10:58 ` Peng Fan
  2025-07-07 10:36   ` Marc Kleine-Budde
  0 siblings, 1 reply; 5+ messages in thread
From: Peng Fan @ 2025-07-07 10:58 UTC (permalink / raw)
  To: Carlos Song, Ulf Hansson, Stephen Boyd
  Cc: mturquette@baylibre.com, sboyd@kernel.org, rafael@kernel.org,
	pavel@kernel.org, len.brown@intel.com, Greg Kroah-Hartman,
	dakr@kernel.org, Aisheng Dong, Andi Shyti, shawnguo@kernel.org,
	s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
	Frank Li, linux-clk@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-i2c@vger.kernel.org,
	imx@lists.linux.dev, linux-kernel@vger.kernel.org, Bough Chen,
	Jun Li

+Ulf

On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
>Hi, All:
>
>We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
>
>We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
>
>The dead lock happen as below call stacks
>
>Task 117                                                Task 120
>
>schedule()
>clk_prepare_lock()--> wait prepare_lock(mutex_lock)     schedule() wait for power.runtime_status exit RPM_SUSPENDING
>                           ^^^^ A                       ^^^^ B
>clk_bulk_unprepare()                                    rpm_resume()
>lpi2c_runtime_suspend()                                 pm_runtime_resume_and_get()
>...                                                     lpi2c_imx_xfer()
>                                                        ...
>rpm_suspend() set RPM_SUSPENDING                        pcf857x_set();
>                           ^^^^ B                       ...
>                                                        clk_prepare_lock() --> hold prepare_lock
>                                                        ^^^^ A
>                                                        ...
>

This is a common issue that clk use a big prepare lock which is easy
to trigger dead lock with runtime pm. I recalled that pengutronix raised
this, but could not find the information.

In this case, there are two clock providers that are independent.
So I think using one global prepare lock does not make sense here.

Stephen, 
 I propose using a per provider prepare lock if the providers are
 totally independent. How do you think?

Thanks,
Peng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
  2025-07-07 10:36   ` Marc Kleine-Budde
@ 2025-07-07 17:28     ` Chen-Yu Tsai
  2025-07-22  9:14       ` Miquel Raynal
  0 siblings, 1 reply; 5+ messages in thread
From: Chen-Yu Tsai @ 2025-07-07 17:28 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Miquel Raynal, Peng Fan, Carlos Song, Ulf Hansson, Stephen Boyd,
	imx@lists.linux.dev, rafael@kernel.org, mturquette@baylibre.com,
	Frank Li, linux-i2c@vger.kernel.org, dakr@kernel.org,
	festevam@gmail.com, linux-clk@vger.kernel.org, pavel@kernel.org,
	Bough Chen, len.brown@intel.com, Andi Shyti,
	linux-pm@vger.kernel.org, s.hauer@pengutronix.de,
	linux-arm-kernel@lists.infradead.org, Aisheng Dong,
	Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	kernel@pengutronix.de, shawnguo@kernel.org, Jun Li

Hi,

On Mon, Jul 7, 2025 at 7:05 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> On 07.07.2025 18:58:16, Peng Fan wrote:
> > On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
> > >Hi, All:
> > >
> > >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
> > >
> > >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
> > >
> > >The dead lock happen as below call stacks
> > >
> > >Task 117                                                Task 120
> > >
> > >schedule()
> > >clk_prepare_lock()--> wait prepare_lock(mutex_lock)     schedule() wait for power.runtime_status exit RPM_SUSPENDING
> > >                           ^^^^ A                       ^^^^ B
> > >clk_bulk_unprepare()                                    rpm_resume()
> > >lpi2c_runtime_suspend()                                 pm_runtime_resume_and_get()
> > >...                                                     lpi2c_imx_xfer()
> > >                                                        ...
> > >rpm_suspend() set RPM_SUSPENDING                        pcf857x_set();
> > >                           ^^^^ B                       ...
> > >                                                        clk_prepare_lock() --> hold prepare_lock
> > >                                                        ^^^^ A
> > >                                                        ...
> > >
> >
> > This is a common issue that clk use a big prepare lock which is easy
> > to trigger dead lock with runtime pm. I recalled that pengutronix raised
> > this, but could not find the information.
>
> Alexander Stein stumbled over this issue some time ago:
>
> | https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/
>
> I encountered it too, while trying to add a clock provider driver for a
> SPI attached CAN controller which uses runtime pm.

Miquel from Bootlin posted a more formal description of the problem and
some possible solutions last year [1].

[1] https://lore.kernel.org/all/20240527181928.4fc6b5f0@xps-13/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dead lock with clock global prepare_lock mutex and device's power.runtime_status
  2025-07-07 17:28     ` Chen-Yu Tsai
@ 2025-07-22  9:14       ` Miquel Raynal
  0 siblings, 0 replies; 5+ messages in thread
From: Miquel Raynal @ 2025-07-22  9:14 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Marc Kleine-Budde, Peng Fan, Carlos Song, Ulf Hansson,
	Stephen Boyd, imx@lists.linux.dev, rafael@kernel.org,
	mturquette@baylibre.com, Frank Li, linux-i2c@vger.kernel.org,
	dakr@kernel.org, festevam@gmail.com, linux-clk@vger.kernel.org,
	pavel@kernel.org, Bough Chen, len.brown@intel.com, Andi Shyti,
	linux-pm@vger.kernel.org, s.hauer@pengutronix.de,
	linux-arm-kernel@lists.infradead.org, Aisheng Dong,
	Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	kernel@pengutronix.de, shawnguo@kernel.org, Jun Li,
	Thomas Petazzoni

Hello,

Thanks Chen-Yu for the heads up!

On 08/07/2025 at 01:28:08 +08, Chen-Yu Tsai <wens@kernel.org> wrote:

> Hi,
>
> On Mon, Jul 7, 2025 at 7:05 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>
>> On 07.07.2025 18:58:16, Peng Fan wrote:
>> > On Tue, Jul 01, 2025 at 03:16:08AM +0000, Carlos Song wrote:
>> > >Hi, All:
>> > >
>> > >We met the dead lock issue recently and think it should be common issue and not sure how to fix it.
>> > >
>> > >We use gpio-gate-clock clock provider (drivers/clk/clk-gpio.c), gpio is one of i2c gpio expander (drivers/gpio/gpio-pcf857x.c). Our i2c driver enable run time pm (drivers/i2c/busses/i2c-imx-lpi2c.c [1]). System random blocked when at reboot.
>> > >
>> > >The dead lock happen as below call stacks
>> > >
>> > >Task 117                                                Task 120
>> > >
>> > >schedule()
>> > >clk_prepare_lock()--> wait prepare_lock(mutex_lock)     schedule() wait for power.runtime_status exit RPM_SUSPENDING
>> > >                           ^^^^ A                       ^^^^ B
>> > >clk_bulk_unprepare()                                    rpm_resume()
>> > >lpi2c_runtime_suspend()                                 pm_runtime_resume_and_get()
>> > >...                                                     lpi2c_imx_xfer()
>> > >                                                        ...
>> > >rpm_suspend() set RPM_SUSPENDING                        pcf857x_set();
>> > >                           ^^^^ B                       ...
>> > >                                                        clk_prepare_lock() --> hold prepare_lock
>> > >                                                        ^^^^ A
>> > >                                                        ...
>> > >
>> >
>> > This is a common issue that clk use a big prepare lock which is easy
>> > to trigger dead lock with runtime pm. I recalled that pengutronix raised
>> > this, but could not find the information.
>>
>> Alexander Stein stumbled over this issue some time ago:
>>
>> | https://lore.kernel.org/all/20230421-kinfolk-glancing-e185fd9c47b4-mkl@pengutronix.de/
>>
>> I encountered it too, while trying to add a clock provider driver for a
>> SPI attached CAN controller which uses runtime pm.
>
> Miquel from Bootlin posted a more formal description of the problem and
> some possible solutions last year [1].
>
> [1] https://lore.kernel.org/all/20240527181928.4fc6b5f0@xps-13/

I also sent an RFC in April:
https://lore.kernel.org/all/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/

I haven't got the energy yet to process the interesting feedback from
Rafael and Stephen. But getting a broader audience and maybe more
feedback will certainly help!

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-22  9:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01  3:16 Dead lock with clock global prepare_lock mutex and device's power.runtime_status Carlos Song
2025-07-07 10:58 ` Peng Fan
2025-07-07 10:36   ` Marc Kleine-Budde
2025-07-07 17:28     ` Chen-Yu Tsai
2025-07-22  9:14       ` Miquel Raynal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).