linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM
@ 2025-03-26 18:26 Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers Miquel Raynal
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

As explained in the following thread, there is a known ABBA locking
dependency between clk and runtime PM.
Link: https://lore.kernel.org/linux-clk/20240527181928.4fc6b5f0@xps-13/

The problem is that the clk subsystem uses a mutex to protect concurrent
accesses to its tree structure, and so do other subsystems such as
generic power domains. While it holds its own mutex, the clk subsystem
performs runtime PM calls which end up executing callbacks from other
subsystems (again, gen PD is in the loop). But typically power domains
may also need to perform clock related operations, and thus the
following two situations may happen:

mutex_lock(clk);
mutex_lock(genpd);

or

mutex_lock(genpd);
mutex_lock(clk);

As of today I know that at least NXP i.MX8MP and MediaTek MT8183 SoCs
are complex enough to face this kind of issues.

There's been a first workaround to "silence" lockdep with the most
obvious case triggering the warning: making sure all clocks are RPM
enabled before running the clk_disable_unused() work, but this is just
addressing one situation among many other potentially problematic
situations. In the past, both Laurent Pinchart and Marek Vasut have
experienced these issues when enabling HDMI and audio support,
respectively.

Following a discussion we had at last Plumbers with Steven, I am
proposing to decouple both locks by changing a bit the clk approach:
let's always runtime resume all clocks that we *might* need before
taking the clock lock. But how do we know the list? Well, depending on
the situation we may either need to wake up:
- the upper part of the tree during prepare/unprepare operations.
- the lower part of the tree during (read) rate operations.
- the upper part and the lower part of the tree otherwise (especially
  during rate changes which may involve reparenting).

Luckily, we do not need to do that by hand, are more importantly we do
not need to use the clock tree for that because thanks to the work from
Saravana, we already have device links describing exhaustively the
consumer/supplier relationships. The clock topology (from a runtime PM
perspective) is reflected in these links. In practice, we do not care
about all consumers, but the few clock operations that will actually
trigger runtime PM operations are probably not impacting enough to
justify something more complex.

So here it is: every patch in this series decouples the two locks in
various places of the clock subsystem, until we reach a point where all
needed clocks will always be resumed before acquiring the core lock. It
obviously requires a few new helpers in the RPM core which may probably
be enhanced, I've tried to keep them as simple and straightforward as
possible.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
Miquel Raynal (10):
      PM: runtime: Add helpers to resume consumers
      clk: Improve comments with usual punctuation
      clk: Avoid non needed runtime PM calls
      clk: Avoid open coded-logic when a there is a helper available
      clk: Move runtime PM calls out of the prepare_lock in clk_init()
      clk: Move runtime PM calls out of the prepare_lock in clk_prepare()
      clk: Ensure all RPM enabled clocks are enabled before reparenting orphans
      clk: Move runtime PM calls out of the prepare_lock in clk_unregister()
      clk: Make sure clock parents and children are resumed when necessary
      clk: Fix the ABBA locking issue with runtime PM subcalls

 drivers/base/power/runtime.c |  54 ++++++++++++
 drivers/clk/clk.c            | 204 +++++++++++++++++++++++++++++++------------
 include/linux/pm_runtime.h   |   2 +
 3 files changed, 204 insertions(+), 56 deletions(-)
---
base-commit: ab6df33805e6e6e4ac1a519cfcade3f7f19f6ff1
change-id: 20250307-cross-lock-dep-e65f285ce8e1

Best regards,
-- 
Miquel Raynal <miquel.raynal@bootlin.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 19:18   ` Rafael J. Wysocki
  2025-03-26 18:26 ` [PATCH RFC 02/10] clk: Improve comments with usual punctuation Miquel Raynal
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

The runtime PM core currently allows to runtime resume/suspend a device,
or its suppliers.

Let's make it also possible to runtime resume/suspend consumers.

Consumers and suppliers are seen here through the description made by
device_links.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/base/power/runtime.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_runtime.h   |  2 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 2ee45841486bc73225b3e971164466647b3ce6d3..04bb66c18e3e4a45751fb3f9a6a1267d73757310 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1841,6 +1841,60 @@ void pm_runtime_put_suppliers(struct device *dev)
 	device_links_read_unlock(idx);
 }
 
+static void __pm_runtime_get_consumers(struct device *dev)
+{
+	struct device_link *link;
+
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held())
+		if (link->flags & DL_FLAG_PM_RUNTIME) {
+			pm_runtime_get_sync(link->consumer);
+			__pm_runtime_get_consumers(link->consumer);
+		}
+}
+
+/**
+ * pm_runtime_get_consumers - Resume and reference-count consumer devices.
+ * @dev: Supplier device.
+ */
+void pm_runtime_get_consumers(struct device *dev)
+{
+	int idx;
+
+	idx = device_links_read_lock();
+
+	__pm_runtime_get_consumers(dev);
+
+	device_links_read_unlock(idx);
+}
+
+static void __pm_runtime_put_consumers(struct device *dev)
+{
+	struct device_link *link;
+
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held())
+		if (link->flags & DL_FLAG_PM_RUNTIME) {
+			pm_runtime_put(link->consumer);
+			__pm_runtime_put_consumers(link->consumer);
+		}
+}
+
+/**
+ * pm_runtime_put_consumers - Drop references to consumer devices.
+ * @dev: Supplier device.
+ */
+void pm_runtime_put_consumers(struct device *dev)
+{
+	int idx;
+
+	idx = device_links_read_lock();
+
+	__pm_runtime_put_consumers(dev);
+
+	device_links_read_unlock(idx);
+}
+
 void pm_runtime_new_link(struct device *dev)
 {
 	spin_lock_irq(&dev->power.lock);
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index d39dc863f612fe18dc34182117f87908d63c8e6d..151c885a3f421f09509232f144618da62297d61d 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -89,6 +89,8 @@ extern u64 pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
 extern void pm_runtime_get_suppliers(struct device *dev);
 extern void pm_runtime_put_suppliers(struct device *dev);
+extern void pm_runtime_get_consumers(struct device *dev);
+extern void pm_runtime_put_consumers(struct device *dev);
 extern void pm_runtime_new_link(struct device *dev);
 extern void pm_runtime_drop_link(struct device_link *link);
 extern void pm_runtime_release_supplier(struct device_link *link);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 02/10] clk: Improve comments with usual punctuation
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 03/10] clk: Avoid non needed runtime PM calls Miquel Raynal
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

These two-line comments did not meant anything to me until I figured out
they were two separated sentences. Clarify these comments.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index cf7720b9172ff223d86227aad144e15375ddfd86..7df9965bcbdffd641e6dbf5bff3d3b20079a3af3 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -289,8 +289,8 @@ static bool clk_core_is_prepared(struct clk_core *core)
 	bool ret = false;
 
 	/*
-	 * .is_prepared is optional for clocks that can prepare
-	 * fall back to software usage counter if it is missing
+	 * .is_prepared is optional for clocks that can prepare.
+	 * Fall back to software usage counter if it is missing.
 	 */
 	if (!core->ops->is_prepared)
 		return core->prepare_count;
@@ -308,8 +308,8 @@ static bool clk_core_is_enabled(struct clk_core *core)
 	bool ret = false;
 
 	/*
-	 * .is_enabled is only mandatory for clocks that gate
-	 * fall back to software usage counter if .is_enabled is missing
+	 * .is_enabled is only mandatory for clocks that gate.
+	 * Fall back to software usage counter if .is_enabled is missing
 	 */
 	if (!core->ops->is_enabled)
 		return core->enable_count;

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 03/10] clk: Avoid non needed runtime PM calls
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 02/10] clk: Improve comments with usual punctuation Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 04/10] clk: Avoid open coded-logic when a there is a helper available Miquel Raynal
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

clk_core_is_prepared() needs the clock to be runtime resumed in order to
call the ->is_prepared() callback. But at the same time, clk_prepare()
runtime resumes the clock and clk_unprepare() runtime disables it.

The fact that the clock might be runtime resumed do not indicate it's
been prepared, however the fact that it's been prepared implies that
it's been runtime resumed.

We can safely check the runtime status of the clock (and RPM increment
it in this case) instead of actually calling resume. With this little
trick, clk_core_is_prepared() can be called from anywhere without extra
constraint regarding the fact that the prepare_lock mutex might be
acquired or not already.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 7df9965bcbdffd641e6dbf5bff3d3b20079a3af3..1c15d72cd3daeeb5bb4f0d94c9f387526fab75ae 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -119,6 +119,20 @@ static int clk_pm_runtime_get(struct clk_core *core)
 	return pm_runtime_resume_and_get(core->dev);
 }
 
+static int clk_pm_runtime_get_if_active(struct clk_core *core)
+{
+	int ret;
+
+	if (!core || !core->rpm_enabled)
+		return 0;
+
+	ret = pm_runtime_get_if_active(core->dev);
+	if (ret == 1)
+		return 0;
+
+	return -EINVAL;
+}
+
 static void clk_pm_runtime_put(struct clk_core *core)
 {
 	if (!core->rpm_enabled)
@@ -295,7 +309,7 @@ static bool clk_core_is_prepared(struct clk_core *core)
 	if (!core->ops->is_prepared)
 		return core->prepare_count;
 
-	if (!clk_pm_runtime_get(core)) {
+	if (!clk_pm_runtime_get_if_active(core)) {
 		ret = core->ops->is_prepared(core->hw);
 		clk_pm_runtime_put(core);
 	}

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 04/10] clk: Avoid open coded-logic when a there is a helper available
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (2 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 03/10] clk: Avoid non needed runtime PM calls Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 05/10] clk: Move runtime PM calls out of the prepare_lock in clk_init() Miquel Raynal
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

The logic for checking whether a clock is already runtime resumed
exists (and increasing the refcounting in this case) is already
available. Reuse the helper instead of open-coding the logic here again.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 1c15d72cd3daeeb5bb4f0d94c9f387526fab75ae..a6148eae7b6615d23d6ac665e8f94e2ae69f93b6 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -338,13 +338,8 @@ static bool clk_core_is_enabled(struct clk_core *core)
 	 * taking enable spinlock, but the below check is needed if one tries
 	 * to call it from other places.
 	 */
-	if (core->rpm_enabled) {
-		pm_runtime_get_noresume(core->dev);
-		if (!pm_runtime_active(core->dev)) {
-			ret = false;
-			goto done;
-		}
-	}
+	if (clk_pm_runtime_get_if_active(core))
+		return false;
 
 	/*
 	 * This could be called with the enable lock held, or from atomic
@@ -359,8 +354,7 @@ static bool clk_core_is_enabled(struct clk_core *core)
 
 	ret = core->ops->is_enabled(core->hw);
 done:
-	if (core->rpm_enabled)
-		pm_runtime_put(core->dev);
+	clk_pm_runtime_put(core);
 
 	return ret;
 }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 05/10] clk: Move runtime PM calls out of the prepare_lock in clk_init()
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (3 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 04/10] clk: Avoid open coded-logic when a there is a helper available Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 06/10] clk: Move runtime PM calls out of the prepare_lock in clk_prepare() Miquel Raynal
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

In order to fix the ABBA locking situation between clock and power
domains, let's disimburse these two locks by preventing any runtime PM
call to happen with the clk prepare_lock mutex acquired.

The __clock_core_init() routine shall preferably call the runtime PM
functions before acquiring the prepare lock, which can be achieved by
the existing clk_pm_runtime_get_all() call. We have no other choice than
waking up all clocks at this stage because we are going the possibly
reparent any random orphan clock in the system at this stage, and thus
will require all its parents and children to be enabled.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index a6148eae7b6615d23d6ac665e8f94e2ae69f93b6..b6decfaf8aa6bd50c2173d19f1ece8f08a95afd8 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3906,6 +3906,14 @@ static int __clk_core_init(struct clk_core *core)
 	unsigned long rate;
 	int phase;
 
+	/*
+	 * While reparenting the orphans, it is too late to resume the children
+	 * that will get their rates recalculated.
+	 */
+	ret = clk_pm_runtime_get_all();
+	if (ret)
+		return ret;
+
 	clk_prepare_lock();
 
 	/*
@@ -3916,10 +3924,6 @@ static int __clk_core_init(struct clk_core *core)
 	 */
 	core->hw->core = core;
 
-	ret = clk_pm_runtime_get(core);
-	if (ret)
-		goto unlock;
-
 	/* check to see if a clock with this name is already registered */
 	if (clk_core_lookup(core->name)) {
 		pr_debug("%s: clk %s already initialized\n",
@@ -4082,8 +4086,6 @@ static int __clk_core_init(struct clk_core *core)
 
 	clk_core_reparent_orphans_nolock();
 out:
-	clk_pm_runtime_put(core);
-unlock:
 	if (ret) {
 		hlist_del_init(&core->child_node);
 		core->hw->core = NULL;
@@ -4091,6 +4093,8 @@ static int __clk_core_init(struct clk_core *core)
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_all();
+
 	if (!ret)
 		clk_debug_register(core);
 

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 06/10] clk: Move runtime PM calls out of the prepare_lock in clk_prepare()
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (4 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 05/10] clk: Move runtime PM calls out of the prepare_lock in clk_init() Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 07/10] clk: Ensure all RPM enabled clocks are enabled before reparenting orphans Miquel Raynal
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

In order to fix the ABBA locking situation between clock and power
domains, let's disimburse these two locks by preventing any runtime PM
call to happen with the clk prepare_lock mutex acquired.

The prepare routine shall preferably call the runtime PM functions
before acquiring the prepare lock. In practice the prepare helper
requires the current clock and all its parents to be runtime
resumed.

There is no need to perform individual calls as we get inner and inner
in recursive calls (ie. following the "clock parents") because, as of
today, runtime PM works by already resuming automatically all the
suppliers which in this case already include the clock parents.

Hence, doing a single runtime PM call at the beginning of the function
is enough. One side effect though is that we now need to check for the
validity of core clk pointers in the runtime PM clk helpers.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index b6decfaf8aa6bd50c2173d19f1ece8f08a95afd8..95f53bc427d8980287bfe668d1c993023e0e078b 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -113,7 +113,7 @@ struct clk {
 /***           runtime pm          ***/
 static int clk_pm_runtime_get(struct clk_core *core)
 {
-	if (!core->rpm_enabled)
+	if (!core || !core->rpm_enabled)
 		return 0;
 
 	return pm_runtime_resume_and_get(core->dev);
@@ -135,7 +135,7 @@ static int clk_pm_runtime_get_if_active(struct clk_core *core)
 
 static void clk_pm_runtime_put(struct clk_core *core)
 {
-	if (!core->rpm_enabled)
+	if (!core || !core->rpm_enabled)
 		return;
 
 	pm_runtime_put_sync(core->dev);
@@ -1081,7 +1081,6 @@ static void clk_core_unprepare(struct clk_core *core)
 
 	trace_clk_unprepare_complete(core);
 	clk_core_unprepare(core->parent);
-	clk_pm_runtime_put(core);
 }
 
 static void clk_core_unprepare_lock(struct clk_core *core)
@@ -1089,6 +1088,8 @@ static void clk_core_unprepare_lock(struct clk_core *core)
 	clk_prepare_lock();
 	clk_core_unprepare(core);
 	clk_prepare_unlock();
+
+	clk_pm_runtime_put(core);
 }
 
 /**
@@ -1121,13 +1122,9 @@ static int clk_core_prepare(struct clk_core *core)
 		return 0;
 
 	if (core->prepare_count == 0) {
-		ret = clk_pm_runtime_get(core);
-		if (ret)
-			return ret;
-
 		ret = clk_core_prepare(core->parent);
 		if (ret)
-			goto runtime_put;
+			return ret;
 
 		trace_clk_prepare(core);
 
@@ -1155,8 +1152,7 @@ static int clk_core_prepare(struct clk_core *core)
 	return 0;
 unprepare:
 	clk_core_unprepare(core->parent);
-runtime_put:
-	clk_pm_runtime_put(core);
+
 	return ret;
 }
 
@@ -1164,10 +1160,17 @@ static int clk_core_prepare_lock(struct clk_core *core)
 {
 	int ret;
 
+	ret = clk_pm_runtime_get(core);
+	if (ret)
+		return ret;
+
 	clk_prepare_lock();
 	ret = clk_core_prepare(core);
 	clk_prepare_unlock();
 
+	if (ret)
+		clk_pm_runtime_put(core);
+
 	return ret;
 }
 

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 07/10] clk: Ensure all RPM enabled clocks are enabled before reparenting orphans
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (5 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 06/10] clk: Move runtime PM calls out of the prepare_lock in clk_prepare() Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 08/10] clk: Move runtime PM calls out of the prepare_lock in clk_unregister() Miquel Raynal
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

In order to fix the ABBA locking situation between clock and power
domains, let's disimburse these two locks by preventing any runtime PM
call to happen with the clk prepare_lock mutex acquired.

Reparenting orphans upon introduction of a new provider means that if
there is a match, the core will recalculate the rates, which requires
the relevant clocks to be enabled.

There is not much we can do to guess which clocks will need rate
recalculation, so better ensure all registered clocks are resumed before
doing the reparenting operation which obviously requires acquiring the
clk prepare_lock to protect against concurrent accesses on the clk tree
topology.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 95f53bc427d8980287bfe668d1c993023e0e078b..4c2f2d2b7735dfbe323fec4e0d331302534bc849 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -5032,8 +5032,16 @@ int of_clk_add_provider(struct device_node *np,
 	mutex_unlock(&of_clk_mutex);
 	pr_debug("Added clock from %pOF\n", np);
 
+	ret = clk_pm_runtime_get_all();
+	if (ret) {
+		of_clk_del_provider(np);
+		return ret;
+	}
+
 	clk_core_reparent_orphans();
 
+	clk_pm_runtime_put_all();
+
 	ret = of_clk_set_defaults(np, true);
 	if (ret < 0)
 		of_clk_del_provider(np);
@@ -5074,8 +5082,16 @@ int of_clk_add_hw_provider(struct device_node *np,
 	mutex_unlock(&of_clk_mutex);
 	pr_debug("Added clk_hw provider from %pOF\n", np);
 
+	ret = clk_pm_runtime_get_all();
+	if (ret) {
+		of_clk_del_provider(np);
+		return ret;
+	}
+
 	clk_core_reparent_orphans();
 
+	clk_pm_runtime_put_all();
+
 	ret = of_clk_set_defaults(np, true);
 	if (ret < 0)
 		of_clk_del_provider(np);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 08/10] clk: Move runtime PM calls out of the prepare_lock in clk_unregister()
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (6 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 07/10] clk: Ensure all RPM enabled clocks are enabled before reparenting orphans Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-03-26 18:26 ` [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary Miquel Raynal
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

In order to fix the ABBA locking situation between clock and power
domains, let's disimburse these two locks by preventing any runtime PM
call to happen with the clk prepare_lock mutex acquired.

The clk_unregister() routine calls clk_core_set_parent_nolock() which
can runtime resume basically any clock randomly in the system after
having acquired the main clk lock. In this case the easier approach to
avoid failures is to make sure we wake up all runtime PM enabled clocks
in the system before acquiring the lock. We are not in a hot path
anyway.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 4c2f2d2b7735dfbe323fec4e0d331302534bc849..339ebfa8cca729ffb84127e01a21f741bc270cb3 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4574,6 +4574,9 @@ void clk_unregister(struct clk *clk)
 
 	clk_debug_unregister(clk->core);
 
+	if (clk_pm_runtime_get_all())
+		return;
+
 	clk_prepare_lock();
 
 	ops = clk->core->ops;
@@ -4617,6 +4620,8 @@ void clk_unregister(struct clk *clk)
 					__func__, clk->core->name);
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_all();
+
 	kref_put(&clk->core->ref, __clk_release);
 	free_clk(clk);
 }

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (7 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 08/10] clk: Move runtime PM calls out of the prepare_lock in clk_unregister() Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-04-09 18:01   ` Rafael J. Wysocki
  2025-03-26 18:26 ` [PATCH RFC 10/10] clk: Fix the ABBA locking issue with runtime PM subcalls Miquel Raynal
  2025-04-15  1:00 ` [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Stephen Boyd
  10 siblings, 1 reply; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

Any pm_runtime_get() call will both wake up the core clock as well as
its parents. But there are some cases which also require resuming the
children clocks. One way to do that is to use the new
pm_runtime_get_consumers() helper.

It's been identified that the following situation may require resuming
the children:
- getting the rate
- setting the rate
- changing the parent (especially since it may produce rate changes)
- putting the clock, which may involve reparenting as well

In order to fix the ABBA locking situation between clock and power
domains, let's disimburse these two locks by resuming the children
outside of the prepare_lock in one function call by using this new
helper.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 339ebfa8cca729ffb84127e01a21f741bc270cb3..26af3a134fa7b9d7f4a77ff473df7e79fd465789 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -119,6 +119,20 @@ static int clk_pm_runtime_get(struct clk_core *core)
 	return pm_runtime_resume_and_get(core->dev);
 }
 
+static int clk_pm_runtime_get_and_consumers(struct clk_core *core)
+{
+	int ret;
+
+	if (!core || !core->rpm_enabled)
+		return 0;
+
+	ret = pm_runtime_resume_and_get(core->dev);
+	if (!ret)
+		pm_runtime_get_consumers(core->dev);
+
+	return ret;
+}
+
 static int clk_pm_runtime_get_if_active(struct clk_core *core)
 {
 	int ret;
@@ -141,6 +155,16 @@ static void clk_pm_runtime_put(struct clk_core *core)
 	pm_runtime_put_sync(core->dev);
 }
 
+static void clk_pm_runtime_put_and_consumers(struct clk_core *core)
+{
+	if (!core || !core->rpm_enabled)
+		return;
+
+	pm_runtime_put_consumers(core->dev);
+
+	pm_runtime_put_sync(core->dev);
+}
+
 /**
  * clk_pm_runtime_get_all() - Runtime "get" all clk provider devices
  *
@@ -2010,10 +2034,15 @@ unsigned long clk_get_rate(struct clk *clk)
 	if (!clk)
 		return 0;
 
+	if (clk_pm_runtime_get_and_consumers(clk->core))
+		return 0;
+
 	clk_prepare_lock();
 	rate = clk_core_get_rate_recalc(clk->core);
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	return rate;
 }
 EXPORT_SYMBOL_GPL(clk_get_rate);
@@ -2605,6 +2634,10 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
 	if (!clk)
 		return 0;
 
+	ret = clk_pm_runtime_get_and_consumers(clk->core);
+	if (ret)
+		return ret;
+
 	/* prevent racing with updates to the clock topology */
 	clk_prepare_lock();
 
@@ -2618,6 +2651,8 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(clk_set_rate);
@@ -2648,6 +2683,10 @@ int clk_set_rate_exclusive(struct clk *clk, unsigned long rate)
 	if (!clk)
 		return 0;
 
+	ret = clk_pm_runtime_get_and_consumers(clk->core);
+	if (ret)
+		return ret;
+
 	/* prevent racing with updates to the clock topology */
 	clk_prepare_lock();
 
@@ -2665,6 +2704,8 @@ int clk_set_rate_exclusive(struct clk *clk, unsigned long rate)
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(clk_set_rate_exclusive);
@@ -2755,12 +2796,18 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max)
 	if (!clk)
 		return 0;
 
+	ret = clk_pm_runtime_get_and_consumers(clk->core);
+	if (ret)
+		return ret;
+
 	clk_prepare_lock();
 
 	ret = clk_set_rate_range_nolock(clk, min, max);
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(clk_set_rate_range);
@@ -2964,6 +3011,10 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
 	if (!clk)
 		return 0;
 
+	ret = clk_pm_runtime_get_and_consumers(clk->core);
+	if (ret)
+		return ret;
+
 	clk_prepare_lock();
 
 	if (clk->exclusive_count)
@@ -2977,6 +3028,8 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(clk_set_parent);
@@ -3459,10 +3512,16 @@ static int clk_rate_set(void *data, u64 val)
 	struct clk_core *core = data;
 	int ret;
 
+	ret = clk_pm_runtime_get_and_consumers(core);
+	if (ret)
+		return ret;
+
 	clk_prepare_lock();
 	ret = clk_core_set_rate_nolock(core, val);
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(core);
+
 	return ret;
 }
 
@@ -3518,11 +3577,18 @@ DEFINE_DEBUGFS_ATTRIBUTE(clk_prepare_enable_fops, clk_prepare_enable_get,
 static int clk_rate_get(void *data, u64 *val)
 {
 	struct clk_core *core = data;
+	int ret;
+
+	ret = clk_pm_runtime_get_and_consumers(core);
+	if (ret)
+		return ret;
 
 	clk_prepare_lock();
 	*val = clk_core_get_rate_recalc(core);
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(core);
+
 	return 0;
 }
 
@@ -3659,12 +3725,18 @@ static ssize_t current_parent_write(struct file *file, const char __user *ubuf,
 	if (!parent)
 		return -ENOENT;
 
+	err = clk_pm_runtime_get_and_consumers(parent);
+	if (err)
+		return err;
+
 	clk_prepare_lock();
 	err = clk_core_set_parent_nolock(core, parent);
 	clk_prepare_unlock();
 	if (err)
 		return err;
 
+	clk_pm_runtime_put_and_consumers(parent);
+
 	return count;
 }
 
@@ -4762,6 +4834,9 @@ void __clk_put(struct clk *clk)
 	if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
 		return;
 
+	if (clk_pm_runtime_get_and_consumers(clk->core))
+		return;
+
 	clk_prepare_lock();
 
 	/*
@@ -4784,6 +4859,8 @@ void __clk_put(struct clk *clk)
 
 	clk_prepare_unlock();
 
+	clk_pm_runtime_put_and_consumers(clk->core);
+
 	owner = clk->core->owner;
 	kref_put(&clk->core->ref, __clk_release);
 	module_put(owner);

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH RFC 10/10] clk: Fix the ABBA locking issue with runtime PM subcalls
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (8 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary Miquel Raynal
@ 2025-03-26 18:26 ` Miquel Raynal
  2025-04-15  1:00 ` [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Stephen Boyd
  10 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

The clock subsystem is calling runtime PM callbacks after having
acquired its own lock, which is in general problematic, especially
because when PM callbacks enter the power domain subsystem, we have the
following scenario:
mutex_lock(prepare_lock)
mutex_lock(genpd_lock)
But on the other side, devices may enable power domains, which
themselves might require clocks, forcing the following path:
mutex_lock(genpd_lock)
mutex_lock(prepare_lock)

The clk core has been modified in order to avoid the need for "late"
runtime PM calls (ie. inside the clk prepare_lock), so what remains to
be done is to simply remove these inner runtime calls.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/clk/clk.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 26af3a134fa7b9d7f4a77ff473df7e79fd465789..652551860201f2d4ed606c55079dc4fb655d9fa0 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1961,10 +1961,9 @@ static unsigned long clk_recalc(struct clk_core *core,
 {
 	unsigned long rate = parent_rate;
 
-	if (core->ops->recalc_rate && !clk_pm_runtime_get(core)) {
+	if (core->ops->recalc_rate)
 		rate = core->ops->recalc_rate(core->hw, parent_rate);
-		clk_pm_runtime_put(core);
-	}
+
 	return rate;
 }
 
@@ -2458,9 +2457,6 @@ static void clk_change_rate(struct clk_core *core)
 		best_parent_rate = core->parent->rate;
 	}
 
-	if (clk_pm_runtime_get(core))
-		return;
-
 	if (core->flags & CLK_SET_RATE_UNGATE) {
 		clk_core_prepare(core);
 		clk_core_enable_lock(core);
@@ -2523,8 +2519,6 @@ static void clk_change_rate(struct clk_core *core)
 	/* handle the new child who might not be in core->children yet */
 	if (core->new_child)
 		clk_change_rate(core->new_child);
-
-	clk_pm_runtime_put(core);
 }
 
 static unsigned long clk_core_req_round_rate_nolock(struct clk_core *core,
@@ -2562,7 +2556,6 @@ static int clk_core_set_rate_nolock(struct clk_core *core,
 {
 	struct clk_core *top, *fail_clk;
 	unsigned long rate;
-	int ret;
 
 	if (!core)
 		return 0;
@@ -2582,28 +2575,21 @@ static int clk_core_set_rate_nolock(struct clk_core *core,
 	if (!top)
 		return -EINVAL;
 
-	ret = clk_pm_runtime_get(core);
-	if (ret)
-		return ret;
-
 	/* notify that we are about to change rates */
 	fail_clk = clk_propagate_rate_change(top, PRE_RATE_CHANGE);
 	if (fail_clk) {
 		pr_debug("%s: failed to set %s rate\n", __func__,
 				fail_clk->name);
 		clk_propagate_rate_change(top, ABORT_RATE_CHANGE);
-		ret = -EBUSY;
-		goto err;
+		return -EBUSY;
 	}
 
 	/* change the rates */
 	clk_change_rate(top);
 
 	core->req_rate = req_rate;
-err:
-	clk_pm_runtime_put(core);
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -2953,16 +2939,12 @@ static int clk_core_set_parent_nolock(struct clk_core *core,
 		p_rate = parent->rate;
 	}
 
-	ret = clk_pm_runtime_get(core);
-	if (ret)
-		return ret;
-
 	/* propagate PRE_RATE_CHANGE notifications */
 	ret = __clk_speculate_rates(core, p_rate);
 
 	/* abort if a driver objects */
 	if (ret & NOTIFY_STOP_MASK)
-		goto runtime_put;
+		return ret;
 
 	/* do the re-parent */
 	ret = __clk_set_parent(core, parent, p_index);
@@ -2975,9 +2957,6 @@ static int clk_core_set_parent_nolock(struct clk_core *core,
 		__clk_recalc_accuracies(core);
 	}
 
-runtime_put:
-	clk_pm_runtime_put(core);
-
 	return ret;
 }
 

-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-03-26 18:26 ` [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers Miquel Raynal
@ 2025-03-26 19:18   ` Rafael J. Wysocki
  2025-03-28  9:59     ` Miquel Raynal
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-03-26 19:18 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd,
	Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan

On Wed, Mar 26, 2025 at 7:26 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> The runtime PM core currently allows to runtime resume/suspend a device,
> or its suppliers.
>
> Let's make it also possible to runtime resume/suspend consumers.
>
> Consumers and suppliers are seen here through the description made by
> device_links.

It would be good to explain why all of this is needed.

I gather that it is used for resolving some synchronization issues in
the clk framework, but neither the cover letter nor this changelog
explains how it is used.

> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/base/power/runtime.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm_runtime.h   |  2 ++
>  2 files changed, 56 insertions(+)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 2ee45841486bc73225b3e971164466647b3ce6d3..04bb66c18e3e4a45751fb3f9a6a1267d73757310 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1841,6 +1841,60 @@ void pm_runtime_put_suppliers(struct device *dev)
>         device_links_read_unlock(idx);
>  }
>
> +static void __pm_runtime_get_consumers(struct device *dev)
> +{
> +       struct device_link *link;
> +
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> +                               device_links_read_lock_held())
> +               if (link->flags & DL_FLAG_PM_RUNTIME) {
> +                       pm_runtime_get_sync(link->consumer);
> +                       __pm_runtime_get_consumers(link->consumer);
> +               }
> +}
> +
> +/**
> + * pm_runtime_get_consumers - Resume and reference-count consumer devices.
> + * @dev: Supplier device.
> + */
> +void pm_runtime_get_consumers(struct device *dev)
> +{
> +       int idx;
> +
> +       idx = device_links_read_lock();
> +
> +       __pm_runtime_get_consumers(dev);
> +
> +       device_links_read_unlock(idx);
> +}
> +
> +static void __pm_runtime_put_consumers(struct device *dev)
> +{
> +       struct device_link *link;
> +
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
> +                               device_links_read_lock_held())
> +               if (link->flags & DL_FLAG_PM_RUNTIME) {
> +                       pm_runtime_put(link->consumer);
> +                       __pm_runtime_put_consumers(link->consumer);
> +               }
> +}
> +
> +/**
> + * pm_runtime_put_consumers - Drop references to consumer devices.
> + * @dev: Supplier device.
> + */
> +void pm_runtime_put_consumers(struct device *dev)
> +{
> +       int idx;
> +
> +       idx = device_links_read_lock();
> +
> +       __pm_runtime_put_consumers(dev);
> +
> +       device_links_read_unlock(idx);
> +}
> +
>  void pm_runtime_new_link(struct device *dev)
>  {
>         spin_lock_irq(&dev->power.lock);
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index d39dc863f612fe18dc34182117f87908d63c8e6d..151c885a3f421f09509232f144618da62297d61d 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -89,6 +89,8 @@ extern u64 pm_runtime_autosuspend_expiration(struct device *dev);
>  extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
>  extern void pm_runtime_get_suppliers(struct device *dev);
>  extern void pm_runtime_put_suppliers(struct device *dev);
> +extern void pm_runtime_get_consumers(struct device *dev);
> +extern void pm_runtime_put_consumers(struct device *dev);
>  extern void pm_runtime_new_link(struct device *dev);
>  extern void pm_runtime_drop_link(struct device_link *link);
>  extern void pm_runtime_release_supplier(struct device_link *link);
>
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-03-26 19:18   ` Rafael J. Wysocki
@ 2025-03-28  9:59     ` Miquel Raynal
  2025-04-09 17:55       ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Miquel Raynal @ 2025-03-28  9:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Danilo Krummrich,
	Michael Turquette, Stephen Boyd, Thomas Petazzoni, linux-pm,
	linux-kernel, linux-clk, Chen-Yu Tsai, Lucas Stach,
	Laurent Pinchart, Marek Vasut, Ulf Hansson, Kevin Hilman,
	Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo, Shengjiu Wang,
	linux-imx, Ian Ray, Hervé Codina, Luca Ceresoli,
	Saravana Kannan

Hello Rafael,

>> The runtime PM core currently allows to runtime resume/suspend a device,
>> or its suppliers.
>>
>> Let's make it also possible to runtime resume/suspend consumers.
>>
>> Consumers and suppliers are seen here through the description made by
>> device_links.
>
> It would be good to explain why all of this is needed.
>
> I gather that it is used for resolving some synchronization issues in
> the clk framework, but neither the cover letter nor this changelog
> explains how it is used.

The explanation is quite long, there have been already 3 full threads
from people attempting to fix a problem that resides in the clock
subsystem (but that may also be probably problematic in others, just
uncovered so far). I don't know if you took the time to read the cover
letter:
https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/
It tries to explain the problem and the approach to fix this problem,
but let me try to give a runtime PM focused view of it here.

[Problem]

We do have an ABBA locking situation between clk and any other subsystem
that might be in use during runtime_resume() operations, provided that
these subsystems also make clk calls at some point. The usual suspect
here are power domains.

There are different approaches that can be taken but the one that felt
the most promising when we discussed it during last LPC (and also the
one that was partially implemented in the clk subsystem already for a
tiny portion of it) is the rule that "subsystem locks should not be kept
acquired while calling in some other subsystems".

Typically in the clk subsystem the logic is:

func() {
        mutex_lock(clk);
        runtime_resume(clk);
        ...
}

Whereas what would definitely work without locking issues is the
opposite:

func() {
        runtime_resume(clk);
        mutex_lock(clk);
        ...
}

Of course life is not so simple, and the clock core is highly
recursive, which means inverting the two calls like I hinted above
simply does not work as we go deeper in the subcalls. As a result, we
need to runtime resume *all* the relevant clocks in advance, before
calling functions recursively (the lock itself is allowed to re-enter
and is not blocking in this case).

I followed all possible paths in the clock subsystem and identified 3
main categories. The list of clocks we need to runtime resume in advance
can either be:
1- the parent clocks
2- the child clocks
3- the parent and child clocks
4- all the clocks (typically for debugfs/sysfs purposes).

[Solution 1: discarded]

The first approach to do that was do to some guessing based on the clock
tree topology. Unfortunately this approach does not stand because it is
virtually unbounded. In order to know the clock topology we must acquire
the clock main lock. In order to runtime resume we must release it. As a
result, this logic is virtually unbounded (even though in practice we
would converge at some point). So this approach was discarded by Steven.

[Solution 2: this proposal]

After the LPC discussion with Steven, I also discussed with Saravana
about this and he pointed that since we were using fw_devlink=rpm by
default now, all providers -including clock controllers of course- would
already be runtime resumed the first time we would make a
runtime_resume(clk), and thus all the nested calls were no longer
needed. This native solution was already addressing point #1 above (and
partially point #3) and all I had to do was to make a similar function
for point #2.

And here we are, trying to resume all consumers (from a device link
perspective) which include, but is not limited to, consumer clocks.

I hope this explanation will help understanding this patch and why it is
needed for this series. As stated in the cover letter, I've tried to
keep the changes here to their minimum. Maybe there are other/better
ways to do that and we can discuss them. My priority is however to get
this possible ABBA deadlock situation sorted out.

I can further expand the commit log with these details if you want.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-03-28  9:59     ` Miquel Raynal
@ 2025-04-09 17:55       ` Rafael J. Wysocki
  2025-04-10  7:54         ` Miquel Raynal
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-04-09 17:55 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd,
	Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan

Hi,

On Fri, Mar 28, 2025 at 10:59 AM Miquel Raynal
<miquel.raynal@bootlin.com> wrote:
>
> Hello Rafael,
>
> >> The runtime PM core currently allows to runtime resume/suspend a device,
> >> or its suppliers.
> >>
> >> Let's make it also possible to runtime resume/suspend consumers.
> >>
> >> Consumers and suppliers are seen here through the description made by
> >> device_links.
> >
> > It would be good to explain why all of this is needed.
> >
> > I gather that it is used for resolving some synchronization issues in
> > the clk framework, but neither the cover letter nor this changelog
> > explains how it is used.
>
> The explanation is quite long, there have been already 3 full threads
> from people attempting to fix a problem that resides in the clock
> subsystem (but that may also be probably problematic in others, just
> uncovered so far). I don't know if you took the time to read the cover
> letter:
> https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/
> It tries to explain the problem and the approach to fix this problem,
> but let me try to give a runtime PM focused view of it here.
>
> [Problem]
>
> We do have an ABBA locking situation between clk and any other subsystem
> that might be in use during runtime_resume() operations, provided that
> these subsystems also make clk calls at some point. The usual suspect
> here are power domains.
>
> There are different approaches that can be taken but the one that felt
> the most promising when we discussed it during last LPC (and also the
> one that was partially implemented in the clk subsystem already for a
> tiny portion of it) is the rule that "subsystem locks should not be kept
> acquired while calling in some other subsystems".
>
> Typically in the clk subsystem the logic is:
>
> func() {
>         mutex_lock(clk);
>         runtime_resume(clk);
>         ...
> }
>
> Whereas what would definitely work without locking issues is the
> opposite:
>
> func() {
>         runtime_resume(clk);
>         mutex_lock(clk);
>         ...
> }
>
> Of course life is not so simple, and the clock core is highly
> recursive, which means inverting the two calls like I hinted above
> simply does not work as we go deeper in the subcalls. As a result, we
> need to runtime resume *all* the relevant clocks in advance, before
> calling functions recursively (the lock itself is allowed to re-enter
> and is not blocking in this case).
>
> I followed all possible paths in the clock subsystem and identified 3
> main categories. The list of clocks we need to runtime resume in advance
> can either be:
> 1- the parent clocks
> 2- the child clocks
> 3- the parent and child clocks
> 4- all the clocks (typically for debugfs/sysfs purposes).
>
> [Solution 1: discarded]
>
> The first approach to do that was do to some guessing based on the clock
> tree topology. Unfortunately this approach does not stand because it is
> virtually unbounded. In order to know the clock topology we must acquire
> the clock main lock. In order to runtime resume we must release it. As a
> result, this logic is virtually unbounded (even though in practice we
> would converge at some point). So this approach was discarded by Steven.
>
> [Solution 2: this proposal]
>
> After the LPC discussion with Steven, I also discussed with Saravana
> about this and he pointed that since we were using fw_devlink=rpm by
> default now, all providers -including clock controllers of course- would
> already be runtime resumed the first time we would make a
> runtime_resume(clk), and thus all the nested calls were no longer
> needed. This native solution was already addressing point #1 above (and
> partially point #3) and all I had to do was to make a similar function
> for point #2.

So this depends on DT being used and fw_devlink=rpm being used, doesn't it?

You cannot really assume in general that there will be device links
between parents and children.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary
  2025-03-26 18:26 ` [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary Miquel Raynal
@ 2025-04-09 18:01   ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-04-09 18:01 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd,
	Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan

On Wed, Mar 26, 2025 at 7:26 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Any pm_runtime_get() call will both wake up the core clock as well as
> its parents. But there are some cases which also require resuming the
> children clocks. One way to do that is to use the new
> pm_runtime_get_consumers() helper.
>
> It's been identified that the following situation may require resuming
> the children:
> - getting the rate
> - setting the rate
> - changing the parent (especially since it may produce rate changes)
> - putting the clock, which may involve reparenting as well
>
> In order to fix the ABBA locking situation between clock and power
> domains, let's disimburse these two locks by resuming the children
> outside of the prepare_lock in one function call by using this new
> helper.
>
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/clk/clk.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 339ebfa8cca729ffb84127e01a21f741bc270cb3..26af3a134fa7b9d7f4a77ff473df7e79fd465789 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -119,6 +119,20 @@ static int clk_pm_runtime_get(struct clk_core *core)
>         return pm_runtime_resume_and_get(core->dev);
>  }
>
> +static int clk_pm_runtime_get_and_consumers(struct clk_core *core)
> +{
> +       int ret;
> +
> +       if (!core || !core->rpm_enabled)
> +               return 0;
> +
> +       ret = pm_runtime_resume_and_get(core->dev);
> +       if (!ret)
> +               pm_runtime_get_consumers(core->dev);

So here, you also need to take children into account directly.

> +
> +       return ret;
> +}
> +
>  static int clk_pm_runtime_get_if_active(struct clk_core *core)
>  {
>         int ret;
> @@ -141,6 +155,16 @@ static void clk_pm_runtime_put(struct clk_core *core)
>         pm_runtime_put_sync(core->dev);
>  }
>
> +static void clk_pm_runtime_put_and_consumers(struct clk_core *core)
> +{
> +       if (!core || !core->rpm_enabled)
> +               return;
> +
> +       pm_runtime_put_consumers(core->dev);

And here too.

> +
> +       pm_runtime_put_sync(core->dev);
> +}
> +
>  /**
>   * clk_pm_runtime_get_all() - Runtime "get" all clk provider devices
>   *
> @@ -2010,10 +2034,15 @@ unsigned long clk_get_rate(struct clk *clk)
>         if (!clk)
>                 return 0;
>
> +       if (clk_pm_runtime_get_and_consumers(clk->core))
> +               return 0;
> +
>         clk_prepare_lock();
>         rate = clk_core_get_rate_recalc(clk->core);
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         return rate;
>  }
>  EXPORT_SYMBOL_GPL(clk_get_rate);
> @@ -2605,6 +2634,10 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
>         if (!clk)
>                 return 0;
>
> +       ret = clk_pm_runtime_get_and_consumers(clk->core);
> +       if (ret)
> +               return ret;
> +
>         /* prevent racing with updates to the clock topology */
>         clk_prepare_lock();
>
> @@ -2618,6 +2651,8 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
>
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(clk_set_rate);
> @@ -2648,6 +2683,10 @@ int clk_set_rate_exclusive(struct clk *clk, unsigned long rate)
>         if (!clk)
>                 return 0;
>
> +       ret = clk_pm_runtime_get_and_consumers(clk->core);
> +       if (ret)
> +               return ret;
> +
>         /* prevent racing with updates to the clock topology */
>         clk_prepare_lock();
>
> @@ -2665,6 +2704,8 @@ int clk_set_rate_exclusive(struct clk *clk, unsigned long rate)
>
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(clk_set_rate_exclusive);
> @@ -2755,12 +2796,18 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max)
>         if (!clk)
>                 return 0;
>
> +       ret = clk_pm_runtime_get_and_consumers(clk->core);
> +       if (ret)
> +               return ret;
> +
>         clk_prepare_lock();
>
>         ret = clk_set_rate_range_nolock(clk, min, max);
>
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(clk_set_rate_range);
> @@ -2964,6 +3011,10 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
>         if (!clk)
>                 return 0;
>
> +       ret = clk_pm_runtime_get_and_consumers(clk->core);
> +       if (ret)
> +               return ret;
> +
>         clk_prepare_lock();
>
>         if (clk->exclusive_count)
> @@ -2977,6 +3028,8 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
>
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         return ret;
>  }
>  EXPORT_SYMBOL_GPL(clk_set_parent);
> @@ -3459,10 +3512,16 @@ static int clk_rate_set(void *data, u64 val)
>         struct clk_core *core = data;
>         int ret;
>
> +       ret = clk_pm_runtime_get_and_consumers(core);
> +       if (ret)
> +               return ret;
> +
>         clk_prepare_lock();
>         ret = clk_core_set_rate_nolock(core, val);
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(core);
> +
>         return ret;
>  }
>
> @@ -3518,11 +3577,18 @@ DEFINE_DEBUGFS_ATTRIBUTE(clk_prepare_enable_fops, clk_prepare_enable_get,
>  static int clk_rate_get(void *data, u64 *val)
>  {
>         struct clk_core *core = data;
> +       int ret;
> +
> +       ret = clk_pm_runtime_get_and_consumers(core);
> +       if (ret)
> +               return ret;
>
>         clk_prepare_lock();
>         *val = clk_core_get_rate_recalc(core);
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(core);
> +
>         return 0;
>  }
>
> @@ -3659,12 +3725,18 @@ static ssize_t current_parent_write(struct file *file, const char __user *ubuf,
>         if (!parent)
>                 return -ENOENT;
>
> +       err = clk_pm_runtime_get_and_consumers(parent);
> +       if (err)
> +               return err;
> +
>         clk_prepare_lock();
>         err = clk_core_set_parent_nolock(core, parent);
>         clk_prepare_unlock();
>         if (err)
>                 return err;
>
> +       clk_pm_runtime_put_and_consumers(parent);
> +
>         return count;
>  }
>
> @@ -4762,6 +4834,9 @@ void __clk_put(struct clk *clk)
>         if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
>                 return;
>
> +       if (clk_pm_runtime_get_and_consumers(clk->core))
> +               return;
> +
>         clk_prepare_lock();
>
>         /*
> @@ -4784,6 +4859,8 @@ void __clk_put(struct clk *clk)
>
>         clk_prepare_unlock();
>
> +       clk_pm_runtime_put_and_consumers(clk->core);
> +
>         owner = clk->core->owner;
>         kref_put(&clk->core->ref, __clk_release);
>         module_put(owner);
>
> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-04-09 17:55       ` Rafael J. Wysocki
@ 2025-04-10  7:54         ` Miquel Raynal
  2025-04-10  9:55           ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Miquel Raynal @ 2025-04-10  7:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Danilo Krummrich,
	Michael Turquette, Stephen Boyd, Thomas Petazzoni, linux-pm,
	linux-kernel, linux-clk, Chen-Yu Tsai, Lucas Stach,
	Laurent Pinchart, Marek Vasut, Ulf Hansson, Kevin Hilman,
	Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo, Shengjiu Wang,
	linux-imx, Ian Ray, Hervé Codina, Luca Ceresoli,
	Saravana Kannan

Hi Rafael,

Thanks for taking the time to read all that.

>> After the LPC discussion with Steven, I also discussed with Saravana
>> about this and he pointed that since we were using fw_devlink=rpm by
>> default now, all providers -including clock controllers of course- would
>> already be runtime resumed the first time we would make a
>> runtime_resume(clk), and thus all the nested calls were no longer
>> needed. This native solution was already addressing point #1 above (and
>> partially point #3) and all I had to do was to make a similar function
>> for point #2.
>
> So this depends on DT being used and fw_devlink=rpm being used,
> doesn't it?

DT, not really. fw_devlink=rpm however, yes.

> You cannot really assume in general that there will be device links
> between parents and children.

But if runtime PM already mandates fw_devlink to be the information
source (which, IIRC is the case since fw_devlink=rpm), then why wouldn't
this approach work? For sure there may be holes in fw_devlink, but
what is the reason for it if we cannot use it?

In other words, are you suggesting that this approach is invalid? If yes
could you elaborate a bit? For this approach to work we do not need all
the parenting to be perfectly described, just relationships between
clock consumers and providers, which are in general rather basic.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers
  2025-04-10  7:54         ` Miquel Raynal
@ 2025-04-10  9:55           ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2025-04-10  9:55 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Michael Turquette, Stephen Boyd,
	Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan

Hi,

On Thu, Apr 10, 2025 at 9:54 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Rafael,
>
> Thanks for taking the time to read all that.
>
> >> After the LPC discussion with Steven, I also discussed with Saravana
> >> about this and he pointed that since we were using fw_devlink=rpm by
> >> default now, all providers -including clock controllers of course- would
> >> already be runtime resumed the first time we would make a
> >> runtime_resume(clk), and thus all the nested calls were no longer
> >> needed. This native solution was already addressing point #1 above (and
> >> partially point #3) and all I had to do was to make a similar function
> >> for point #2.
> >
> > So this depends on DT being used and fw_devlink=rpm being used,
> > doesn't it?
>
> DT, not really. fw_devlink=rpm however, yes.

Which means DT because fw_devlink=rpm is DT-specific.  At least it is
not used on systems where ACPI is the firmware interface.

> > You cannot really assume in general that there will be device links
> > between parents and children.
>
> But if runtime PM already mandates fw_devlink to be the information
> source (which, IIRC is the case since fw_devlink=rpm), then why wouldn't
> this approach work? For sure there may be holes in fw_devlink, but
> what is the reason for it if we cannot use it?

Well, see above.

> In other words, are you suggesting that this approach is invalid? If yes
> could you elaborate a bit? For this approach to work we do not need all
> the parenting to be perfectly described, just relationships between
> clock consumers and providers, which are in general rather basic.

So you know which devices are parents and children without fw_devlink
and this information can be used readily on all systems AFAICS.

IIUC, the overall approach is to resume the entire hierarchy before
making changes that may deadlock and I think that this is a good idea
in general.

However, you need to do it in a way that's usable on all systems and
when you walk the hierarchy from top to bottom, you need to do it
recursively I think because resuming a device doesn't cause its
children or consumers to resume.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM
  2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
                   ` (9 preceding siblings ...)
  2025-03-26 18:26 ` [PATCH RFC 10/10] clk: Fix the ABBA locking issue with runtime PM subcalls Miquel Raynal
@ 2025-04-15  1:00 ` Stephen Boyd
  2025-10-03 16:24   ` Luca Ceresoli
  10 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2025-04-15  1:00 UTC (permalink / raw)
  To: Danilo Krummrich, Greg Kroah-Hartman, Len Brown,
	Michael Turquette, Miquel Raynal, Pavel Machek, Rafael J. Wysocki
  Cc: Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Luca Ceresoli, Saravana Kannan, Miquel Raynal

Quoting Miquel Raynal (2025-03-26 11:26:15)
> As explained in the following thread, there is a known ABBA locking
> dependency between clk and runtime PM.
> Link: https://lore.kernel.org/linux-clk/20240527181928.4fc6b5f0@xps-13/
> 
> The problem is that the clk subsystem uses a mutex to protect concurrent
> accesses to its tree structure, and so do other subsystems such as
> generic power domains. While it holds its own mutex, the clk subsystem
> performs runtime PM calls which end up executing callbacks from other
> subsystems (again, gen PD is in the loop). But typically power domains
> may also need to perform clock related operations, and thus the
> following two situations may happen:
> 
> mutex_lock(clk);
> mutex_lock(genpd);
> 
> or
> 
> mutex_lock(genpd);
> mutex_lock(clk);
> 
> As of today I know that at least NXP i.MX8MP and MediaTek MT8183 SoCs
> are complex enough to face this kind of issues.
> 
> There's been a first workaround to "silence" lockdep with the most
> obvious case triggering the warning: making sure all clocks are RPM
> enabled before running the clk_disable_unused() work, but this is just
> addressing one situation among many other potentially problematic
> situations. In the past, both Laurent Pinchart and Marek Vasut have
> experienced these issues when enabling HDMI and audio support,
> respectively.
> 
> Following a discussion we had at last Plumbers with Steven, I am
> proposing to decouple both locks by changing a bit the clk approach:
> let's always runtime resume all clocks that we *might* need before
> taking the clock lock. But how do we know the list? Well, depending on
> the situation we may either need to wake up:
> - the upper part of the tree during prepare/unprepare operations.
> - the lower part of the tree during (read) rate operations.
> - the upper part and the lower part of the tree otherwise (especially
>   during rate changes which may involve reparenting).

Thanks for taking on this work. This problem is coming up more and more
often.

> 
> Luckily, we do not need to do that by hand, are more importantly we do
> not need to use the clock tree for that because thanks to the work from
> Saravana, we already have device links describing exhaustively the
> consumer/supplier relationships. The clock topology (from a runtime PM
> perspective) is reflected in these links. In practice, we do not care
> about all consumers, but the few clock operations that will actually
> trigger runtime PM operations are probably not impacting enough to
> justify something more complex.

This won't always work, for a couple reasons. First because clk drivers
aren't required to describe their parent clks that are outside the clk
controller by using DT with a 'clocks' property in the clk controller
node. Second because there can be a many to one relationship between a
struct device and struct device_node. We're trying to push drivers to be
written in a way that the binding has the 'clocks' property, but that
isn't always the case, so we still need a solution that works in all
cases so as to not regress old (legacy?) implementations or ones that
divide a platform device into some number of auxiliary devices and
drivers.

One idea to do that would be to implement the device links between clk
controller devices based on all possible parents of the clk. We support
lazily registering clks though, meaning a parent could be registered at
any time, so we would have to explore the clk tree each time a clk is
registered to see if any new clks need to be found and device links made
between devices. The general algorithm is probably something like:

  clk_register()
   make_links_for_node()
    if device node missing 'clocks' property
     for each parent string
      pclk = find parent clk
      pdev = pclk->dev
      link pdev to dev node
    else
     for each clk in clocks property
      pclk = find parent clk
      pdev = pclk->dev
      link pdev to dev node

We have to get the parent clk in all cases because we don't know which
device it may be registered for (the platform device or auxiliary
device). If we optimize here, I'd prefer we optimize for the case where
the 'clocks' property is present to encourage migration. Maybe what we
can do is make some structure for a clk controller and have a linked
list of those that we look up when a new clk is registered. We actually
have that already with 'struct clock_provider' so maybe we need to
extend that.

Stash the device pointer in there and some variable sized array of the
clk_core pointers to the external clks. In the 'clocks' DT property
case, we can size this immediately and map the array index to the
property but in the non-property case we'll have to grow this array each
time a new clk is found to be a parent of the device. Maybe for that we
should just have some other sort of linked list of clk_core pointers
that we continue to stack clks onto.

  struct clock_provider {
    void (*clk_init_cb)(struct device_node *);
    struct device *dev;
    struct device_node *np;
    struct list_head node;
    struct clk_core *legacy_clks; // Or struct list_head legacy?
    size_t len_clocks;
    struct clk_core clocks_property[];
 };

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM
  2025-04-15  1:00 ` [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Stephen Boyd
@ 2025-10-03 16:24   ` Luca Ceresoli
  0 siblings, 0 replies; 19+ messages in thread
From: Luca Ceresoli @ 2025-10-03 16:24 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Danilo Krummrich, Greg Kroah-Hartman, Len Brown,
	Michael Turquette, Miquel Raynal, Pavel Machek, Rafael J. Wysocki,
	Thomas Petazzoni, linux-pm, linux-kernel, linux-clk, Chen-Yu Tsai,
	Lucas Stach, Laurent Pinchart, Marek Vasut, Ulf Hansson,
	Kevin Hilman, Fabio Estevam, Jacky Bai, Peng Fan, Shawn Guo,
	Shengjiu Wang, linux-imx, Ian Ray, Hervé Codina,
	Saravana Kannan

Hello Stephen, all,

On Mon, 14 Apr 2025 18:00:15 -0700
Stephen Boyd <sboyd@kernel.org> wrote:

> Quoting Miquel Raynal (2025-03-26 11:26:15)
> > As explained in the following thread, there is a known ABBA locking
> > dependency between clk and runtime PM.
> > Link: https://lore.kernel.org/linux-clk/20240527181928.4fc6b5f0@xps-13/
> > 
> > The problem is that the clk subsystem uses a mutex to protect concurrent
> > accesses to its tree structure, and so do other subsystems such as
> > generic power domains. While it holds its own mutex, the clk subsystem
> > performs runtime PM calls which end up executing callbacks from other
> > subsystems (again, gen PD is in the loop). But typically power domains
> > may also need to perform clock related operations, and thus the
> > following two situations may happen:
> > 
> > mutex_lock(clk);
> > mutex_lock(genpd);
> > 
> > or
> > 
> > mutex_lock(genpd);
> > mutex_lock(clk);
> > 
> > As of today I know that at least NXP i.MX8MP and MediaTek MT8183 SoCs
> > are complex enough to face this kind of issues.
> > 
> > There's been a first workaround to "silence" lockdep with the most
> > obvious case triggering the warning: making sure all clocks are RPM
> > enabled before running the clk_disable_unused() work, but this is just
> > addressing one situation among many other potentially problematic
> > situations. In the past, both Laurent Pinchart and Marek Vasut have
> > experienced these issues when enabling HDMI and audio support,
> > respectively.
> > 
> > Following a discussion we had at last Plumbers with Steven, I am
> > proposing to decouple both locks by changing a bit the clk approach:
> > let's always runtime resume all clocks that we *might* need before
> > taking the clock lock. But how do we know the list? Well, depending on
> > the situation we may either need to wake up:
> > - the upper part of the tree during prepare/unprepare operations.
> > - the lower part of the tree during (read) rate operations.
> > - the upper part and the lower part of the tree otherwise (especially
> >   during rate changes which may involve reparenting).  
> 
> Thanks for taking on this work. This problem is coming up more and more
> often.

Reviving this thread after today I had a very rare occurrence of
apparently this same issue:

  WARNING: possible circular locking dependency detected

It happened on imx8mp, on a board and with a setup that I'm using since
many months to do unrelated development (mostly DRM). It was a very rare
occurrence because I always have clk_ignore_unused in my kernel cmdline.

On my setup that warning appeared exactly once in thousands of boots
I've done in several months. Just rebooting without changing anything
and it didn't show up again.

Here's the full warning message:

[    5.077473] ======================================================
[    5.083658] WARNING: possible circular locking dependency detected
[    5.089845] 6.17.0-rc4+ #2 Tainted: G                T  
[    5.095164] ------------------------------------------------------
[    5.101346] kworker/u16:4/52 is trying to acquire lock:
[    5.106576] ffff0000016ae740 (&genpd->mlock){+.+.}-{4:4}, at: genpd_lock_mtx+0x20/0x38
[    5.114533] 
[    5.114533] but task is already holding lock:
[    5.120368] ffff800084eb5258 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x38/0xc0
[    5.128404] 
[    5.128404] which lock already depends on the new lock.
[    5.128404] 
[    5.136583] 
[    5.136583] the existing dependency chain (in reverse order) is:
[    5.144070] 
[    5.144070] -> #1 (prepare_lock){+.+.}-{4:4}:
[    5.149924]        __mutex_lock+0xb8/0x7f0
[    5.154034]        mutex_lock_nested+0x2c/0x40
[    5.158487]        clk_prepare_lock+0x58/0xc0
[    5.162849]        clk_prepare+0x28/0x58
[    5.166780]        clk_bulk_prepare+0x54/0xe8
[    5.171141]        imx_pgc_power_up+0x80/0x378
[    5.175592]        _genpd_power_on+0xa0/0x168
[    5.179955]        genpd_power_on+0xd8/0x248
[    5.184234]        genpd_runtime_resume+0x12c/0x298
[    5.189121]        __rpm_callback+0x50/0x200
[    5.193400]        rpm_callback+0x7c/0x90
[    5.197414]        rpm_resume+0x534/0x718
[    5.201432]        __pm_runtime_resume+0x58/0xa8
[    5.206056]        pm_runtime_get_suppliers+0x6c/0xa0
[    5.211117]        __driver_probe_device+0x50/0x140
[    5.216002]        driver_probe_device+0xe0/0x170
[    5.220710]        __driver_attach+0xa0/0x1c0
[    5.225074]        bus_for_each_dev+0x90/0xf8
[    5.229436]        driver_attach+0x2c/0x40
[    5.233538]        bus_add_driver+0xec/0x218
[    5.237816]        driver_register+0x64/0x138
[    5.242178]        __platform_driver_register+0x2c/0x40
[    5.247411]        hotplug_bridge_dynconn_get_modes+0x28/0x48 [hotplug_bridge]
[    5.254654]        do_one_initcall+0x84/0x358
[    5.259020]        do_init_module+0x60/0x268
[    5.263298]        load_module+0x1fc4/0x2108
[    5.267574]        init_module_from_file+0x90/0xe0
[    5.272372]        idempotent_init_module+0x1f8/0x300
[    5.277432]        __arm64_sys_finit_module+0x6c/0xb8
[    5.282491]        invoke_syscall+0x50/0x120
[    5.286771]        el0_svc_common.constprop.0+0x48/0xf0
[    5.292004]        do_el0_svc+0x24/0x38
[    5.295848]        el0_svc+0x4c/0x160
[    5.299519]        el0t_64_sync_handler+0xa0/0xe8
[    5.304229]        el0t_64_sync+0x198/0x1a0
[    5.308420] 
[    5.308420] -> #0 (&genpd->mlock){+.+.}-{4:4}:
[    5.314359]        __lock_acquire+0x1338/0x1f50
[    5.318897]        lock_acquire+0x1c4/0x350
[    5.323089]        __mutex_lock+0xb8/0x7f0
[    5.327194]        mutex_lock_nested+0x2c/0x40
[    5.331645]        genpd_lock_mtx+0x20/0x38
[    5.335834]        genpd_runtime_resume+0x118/0x298
[    5.340721]        __rpm_callback+0x50/0x200
[    5.344997]        rpm_callback+0x7c/0x90
[    5.349013]        rpm_resume+0x534/0x718
[    5.353029]        __pm_runtime_resume+0x58/0xa8
[    5.357653]        clk_pm_runtime_get.part.0.isra.0+0x24/0x98
[    5.363408]        __clk_register+0x51c/0x970
[    5.367771]        devm_clk_hw_register+0x64/0xe8
[    5.372481]        imx8mp_hsio_blk_ctrl_probe+0xa0/0xf8
[    5.377712]        imx8mp_blk_ctrl_probe+0x358/0x568
[    5.382684]        platform_probe+0x64/0xa8
[    5.386875]        really_probe+0xc4/0x2b8
[    5.390976]        __driver_probe_device+0x80/0x140
[    5.395860]        driver_probe_device+0xe0/0x170
[    5.400571]        __device_attach_driver+0xc0/0x148
[    5.405542]        bus_for_each_drv+0x9c/0x108
[    5.409991]        __device_attach+0xa8/0x1a0
[    5.414354]        device_initial_probe+0x1c/0x30
[    5.419065]        bus_probe_device+0xb4/0xc0
[    5.423428]        deferred_probe_work_func+0x90/0xd8
[    5.428485]        process_one_work+0x214/0x618
[    5.433027]        worker_thread+0x1b4/0x368
[    5.437305]        kthread+0x150/0x238
[    5.441062]        ret_from_fork+0x10/0x20
[    5.445165] 
[    5.445165] other info that might help us debug this:
[    5.445165] 
[    5.453171]  Possible unsafe locking scenario:
[    5.453171] 
[    5.459091]        CPU0                    CPU1
[    5.463622]        ----                    ----
[    5.468151]   lock(prepare_lock);
[    5.471476]                                lock(&genpd->mlock);
[    5.477405]                                lock(prepare_lock);
[    5.483248]   lock(&genpd->mlock);
[    5.486655] 
[    5.486655]  *** DEADLOCK ***
[    5.486655] 
[    5.492577] 4 locks held by kworker/u16:4/52:
[    5.496937]  #0: ffff000000030948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x198/0x618
[    5.507051]  #1: ffff800086e3bd70 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1c0/0x618
[    5.516299]  #2: ffff000000c608f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x44/0x1a0
[    5.524680]  #3: ffff800084eb5258 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x38/0xc0
[    5.533150] 
[    5.533150] stack backtrace:
[    5.537512] CPU: 2 UID: 0 PID: 52 Comm: kworker/u16:4 Tainted: G                T   6.17.0-rc4+ #2 PREEMPT 
[    5.547262] Tainted: [T]=RANDSTRUCT
[    5.550752] Hardware name: ...
[    5.557284] Workqueue: events_unbound deferred_probe_work_func
[    5.563128] Call trace:
[    5.565578]  show_stack+0x20/0x38 (C)
[    5.569251]  dump_stack_lvl+0x8c/0xd0
[    5.572919]  dump_stack+0x18/0x28
[    5.576239]  print_circular_bug+0x28c/0x370
[    5.580431]  check_noncircular+0x178/0x190
[    5.584537]  __lock_acquire+0x1338/0x1f50
[    5.588558]  lock_acquire+0x1c4/0x350
[    5.592231]  __mutex_lock+0xb8/0x7f0
[    5.595815]  mutex_lock_nested+0x2c/0x40
[    5.599750]  genpd_lock_mtx+0x20/0x38
[    5.603418]  genpd_runtime_resume+0x118/0x298
[    5.607786]  __rpm_callback+0x50/0x200
[    5.611543]  rpm_callback+0x7c/0x90
[    5.615041]  rpm_resume+0x534/0x718
[    5.618539]  __pm_runtime_resume+0x58/0xa8
[    5.622644]  clk_pm_runtime_get.part.0.isra.0+0x24/0x98
[    5.627876]  __clk_register+0x51c/0x970
[    5.631717]  devm_clk_hw_register+0x64/0xe8
[    5.635909]  imx8mp_hsio_blk_ctrl_probe+0xa0/0xf8
[    5.640622]  imx8mp_blk_ctrl_probe+0x358/0x568
[    5.645071]  platform_probe+0x64/0xa8
[    5.648743]  really_probe+0xc4/0x2b8
[    5.652326]  __driver_probe_device+0x80/0x140
[    5.656693]  driver_probe_device+0xe0/0x170
[    5.660886]  __device_attach_driver+0xc0/0x148
[    5.665339]  bus_for_each_drv+0x9c/0x108
[    5.669269]  __device_attach+0xa8/0x1a0
[    5.673115]  device_initial_probe+0x1c/0x30
[    5.677308]  bus_probe_device+0xb4/0xc0
[    5.681152]  deferred_probe_work_func+0x90/0xd8
[    5.685691]  process_one_work+0x214/0x618
[    5.689713]  worker_thread+0x1b4/0x368
[    5.693471]  kthread+0x150/0x238
[    5.696709]  ret_from_fork+0x10/0x20

You're welcome to ask for more info, even though I'm afraid I might be
unable to provide them given how rare this event is.

Best regards,
Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-10-03 16:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 18:26 [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 01/10] PM: runtime: Add helpers to resume consumers Miquel Raynal
2025-03-26 19:18   ` Rafael J. Wysocki
2025-03-28  9:59     ` Miquel Raynal
2025-04-09 17:55       ` Rafael J. Wysocki
2025-04-10  7:54         ` Miquel Raynal
2025-04-10  9:55           ` Rafael J. Wysocki
2025-03-26 18:26 ` [PATCH RFC 02/10] clk: Improve comments with usual punctuation Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 03/10] clk: Avoid non needed runtime PM calls Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 04/10] clk: Avoid open coded-logic when a there is a helper available Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 05/10] clk: Move runtime PM calls out of the prepare_lock in clk_init() Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 06/10] clk: Move runtime PM calls out of the prepare_lock in clk_prepare() Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 07/10] clk: Ensure all RPM enabled clocks are enabled before reparenting orphans Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 08/10] clk: Move runtime PM calls out of the prepare_lock in clk_unregister() Miquel Raynal
2025-03-26 18:26 ` [PATCH RFC 09/10] clk: Make sure clock parents and children are resumed when necessary Miquel Raynal
2025-04-09 18:01   ` Rafael J. Wysocki
2025-03-26 18:26 ` [PATCH RFC 10/10] clk: Fix the ABBA locking issue with runtime PM subcalls Miquel Raynal
2025-04-15  1:00 ` [PATCH RFC 00/10] Fix the ABBA locking situation between clk and runtime PM Stephen Boyd
2025-10-03 16:24   ` Luca Ceresoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).