linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks
@ 2017-01-12 17:17 Ulf Hansson
  2017-01-12 17:17 ` [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off Ulf Hansson
  2017-01-12 17:17 ` [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
  0 siblings, 2 replies; 6+ messages in thread
From: Ulf Hansson @ 2017-01-12 17:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ulf Hansson, linux-pm
  Cc: Len Brown, Pavel Machek, Kevin Hilman, Geert Uytterhoeven,
	Lina Iyer, Jon Hunter, Marek Szyprowski, Brian Norris

This problem was brought up by Brian Norris [1], and I decided to help out
fixing the issue in genpd.

While working on it, I realized that the execution path when powering on/off a
PM domain in genpd, wasn't self-explanatory, so I folded in a change which
improved the code around that a bit.

I did not target stable or fixes for this, but perhaps we should!?

[1]
https://marc.info/?l=linux-pm&m=147990652405772&w=2


Ulf Hansson (2):
  PM / Domains: Rename functions in genpd for power on/off
  PM / Domains: Fix asynchronous execution of *noirq() callbacks

 drivers/base/power/domain.c | 116 +++++++++++++++++++++++---------------------
 1 file changed, 60 insertions(+), 56 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off
  2017-01-12 17:17 [RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
@ 2017-01-12 17:17 ` Ulf Hansson
  2017-01-13 19:53   ` Kevin Hilman
  2017-01-12 17:17 ` [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
  1 sibling, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2017-01-12 17:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ulf Hansson, linux-pm
  Cc: Len Brown, Pavel Machek, Kevin Hilman, Geert Uytterhoeven,
	Lina Iyer, Jon Hunter, Marek Szyprowski, Brian Norris

Currently the mix of genpd_poweron(), genpd_power_on(),
genpd_sync_poweron() and the ->power_on() callback, makes a bit difficult
to follow the path of execution. The similar applies to the functions
dealing with power off.

In a way to improve this understanding, let's do the following renaming:

genpd_power_on() ->  _genpd_power_on()
genpd_poweron() -> genpd_power_on()
genpd_sync_poweron() -> genpd_sync_power_on()

genpd_power_off() -> _genpd_power_off()
genpd_poweroff() -> genpd_power_off()
genpd_sync_poweroff() -> genpd_sync_power_off()
genpd_poweroff_unused() -> genpd_power_off_unused()

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/base/power/domain.c | 72 ++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 2997026..fd2e3e1 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -201,7 +201,7 @@ static void genpd_sd_counter_inc(struct generic_pm_domain *genpd)
 	smp_mb__after_atomic();
 }
 
-static int genpd_power_on(struct generic_pm_domain *genpd, bool timed)
+static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed)
 {
 	unsigned int state_idx = genpd->state_idx;
 	ktime_t time_start;
@@ -231,7 +231,7 @@ static int genpd_power_on(struct generic_pm_domain *genpd, bool timed)
 	return ret;
 }
 
-static int genpd_power_off(struct generic_pm_domain *genpd, bool timed)
+static int _genpd_power_off(struct generic_pm_domain *genpd, bool timed)
 {
 	unsigned int state_idx = genpd->state_idx;
 	ktime_t time_start;
@@ -262,10 +262,10 @@ static int genpd_power_off(struct generic_pm_domain *genpd, bool timed)
 }
 
 /**
- * genpd_queue_power_off_work - Queue up the execution of genpd_poweroff().
+ * genpd_queue_power_off_work - Queue up the execution of genpd_power_off().
  * @genpd: PM domain to power off.
  *
- * Queue up the execution of genpd_poweroff() unless it's already been done
+ * Queue up the execution of genpd_power_off() unless it's already been done
  * before.
  */
 static void genpd_queue_power_off_work(struct generic_pm_domain *genpd)
@@ -274,14 +274,14 @@ static void genpd_queue_power_off_work(struct generic_pm_domain *genpd)
 }
 
 /**
- * genpd_poweron - Restore power to a given PM domain and its masters.
+ * genpd_power_on - Restore power to a given PM domain and its masters.
  * @genpd: PM domain to power up.
  * @depth: nesting count for lockdep.
  *
  * Restore power to @genpd and all of its masters so that it is possible to
  * resume a device belonging to it.
  */
-static int genpd_poweron(struct generic_pm_domain *genpd, unsigned int depth)
+static int genpd_power_on(struct generic_pm_domain *genpd, unsigned int depth)
 {
 	struct gpd_link *link;
 	int ret = 0;
@@ -300,7 +300,7 @@ static int genpd_poweron(struct generic_pm_domain *genpd, unsigned int depth)
 		genpd_sd_counter_inc(master);
 
 		genpd_lock_nested(master, depth + 1);
-		ret = genpd_poweron(master, depth + 1);
+		ret = genpd_power_on(master, depth + 1);
 		genpd_unlock(master);
 
 		if (ret) {
@@ -309,7 +309,7 @@ static int genpd_poweron(struct generic_pm_domain *genpd, unsigned int depth)
 		}
 	}
 
-	ret = genpd_power_on(genpd, true);
+	ret = _genpd_power_on(genpd, true);
 	if (ret)
 		goto err;
 
@@ -368,14 +368,14 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb,
 }
 
 /**
- * genpd_poweroff - Remove power from a given PM domain.
+ * genpd_power_off - Remove power from a given PM domain.
  * @genpd: PM domain to power down.
  * @is_async: PM domain is powered down from a scheduled work
  *
  * If all of the @genpd's devices have been suspended and all of its subdomains
  * have been powered down, remove power from @genpd.
  */
-static int genpd_poweroff(struct generic_pm_domain *genpd, bool is_async)
+static int genpd_power_off(struct generic_pm_domain *genpd, bool is_async)
 {
 	struct pm_domain_data *pdd;
 	struct gpd_link *link;
@@ -427,13 +427,13 @@ static int genpd_poweroff(struct generic_pm_domain *genpd, bool is_async)
 
 		/*
 		 * If sd_count > 0 at this point, one of the subdomains hasn't
-		 * managed to call genpd_poweron() for the master yet after
-		 * incrementing it.  In that case genpd_poweron() will wait
+		 * managed to call genpd_power_on() for the master yet after
+		 * incrementing it.  In that case genpd_power_on() will wait
 		 * for us to drop the lock, so we can call .power_off() and let
-		 * the genpd_poweron() restore power for us (this shouldn't
+		 * the genpd_power_on() restore power for us (this shouldn't
 		 * happen very often).
 		 */
-		ret = genpd_power_off(genpd, true);
+		ret = _genpd_power_off(genpd, true);
 		if (ret)
 			return ret;
 	}
@@ -459,7 +459,7 @@ static void genpd_power_off_work_fn(struct work_struct *work)
 	genpd = container_of(work, struct generic_pm_domain, power_off_work);
 
 	genpd_lock(genpd);
-	genpd_poweroff(genpd, true);
+	genpd_power_off(genpd, true);
 	genpd_unlock(genpd);
 }
 
@@ -578,7 +578,7 @@ static int genpd_runtime_suspend(struct device *dev)
 		return 0;
 
 	genpd_lock(genpd);
-	genpd_poweroff(genpd, false);
+	genpd_power_off(genpd, false);
 	genpd_unlock(genpd);
 
 	return 0;
@@ -618,7 +618,7 @@ static int genpd_runtime_resume(struct device *dev)
 	}
 
 	genpd_lock(genpd);
-	ret = genpd_poweron(genpd, 0);
+	ret = genpd_power_on(genpd, 0);
 	genpd_unlock(genpd);
 
 	if (ret)
@@ -658,7 +658,7 @@ static int genpd_runtime_resume(struct device *dev)
 	if (!pm_runtime_is_irq_safe(dev) ||
 		(pm_runtime_is_irq_safe(dev) && genpd_is_irq_safe(genpd))) {
 		genpd_lock(genpd);
-		genpd_poweroff(genpd, 0);
+		genpd_power_off(genpd, 0);
 		genpd_unlock(genpd);
 	}
 
@@ -674,9 +674,9 @@ static int __init pd_ignore_unused_setup(char *__unused)
 __setup("pd_ignore_unused", pd_ignore_unused_setup);
 
 /**
- * genpd_poweroff_unused - Power off all PM domains with no devices in use.
+ * genpd_power_off_unused - Power off all PM domains with no devices in use.
  */
-static int __init genpd_poweroff_unused(void)
+static int __init genpd_power_off_unused(void)
 {
 	struct generic_pm_domain *genpd;
 
@@ -694,7 +694,7 @@ static int __init genpd_poweroff_unused(void)
 
 	return 0;
 }
-late_initcall(genpd_poweroff_unused);
+late_initcall(genpd_power_off_unused);
 
 #if defined(CONFIG_PM_SLEEP) || defined(CONFIG_PM_GENERIC_DOMAINS_OF)
 
@@ -727,7 +727,7 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
 }
 
 /**
- * genpd_sync_poweroff - Synchronously power off a PM domain and its masters.
+ * genpd_sync_power_off - Synchronously power off a PM domain and its masters.
  * @genpd: PM domain to power off, if possible.
  *
  * Check if the given PM domain can be powered off (during system suspend or
@@ -738,7 +738,7 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
  * executed sequentially, so it is guaranteed that it will never run twice in
  * parallel).
  */
-static void genpd_sync_poweroff(struct generic_pm_domain *genpd)
+static void genpd_sync_power_off(struct generic_pm_domain *genpd)
 {
 	struct gpd_link *link;
 
@@ -751,18 +751,18 @@ static void genpd_sync_poweroff(struct generic_pm_domain *genpd)
 
 	/* Choose the deepest state when suspending */
 	genpd->state_idx = genpd->state_count - 1;
-	genpd_power_off(genpd, false);
+	_genpd_power_off(genpd, false);
 
 	genpd->status = GPD_STATE_POWER_OFF;
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
 		genpd_sd_counter_dec(link->master);
-		genpd_sync_poweroff(link->master);
+		genpd_sync_power_off(link->master);
 	}
 }
 
 /**
- * genpd_sync_poweron - Synchronously power on a PM domain and its masters.
+ * genpd_sync_power_on - Synchronously power on a PM domain and its masters.
  * @genpd: PM domain to power on.
  *
  * This function is only called in "noirq" and "syscore" stages of system power
@@ -770,7 +770,7 @@ static void genpd_sync_poweroff(struct generic_pm_domain *genpd)
  * executed sequentially, so it is guaranteed that it will never run twice in
  * parallel).
  */
-static void genpd_sync_poweron(struct generic_pm_domain *genpd)
+static void genpd_sync_power_on(struct generic_pm_domain *genpd)
 {
 	struct gpd_link *link;
 
@@ -778,11 +778,11 @@ static void genpd_sync_poweron(struct generic_pm_domain *genpd)
 		return;
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
-		genpd_sync_poweron(link->master);
+		genpd_sync_power_on(link->master);
 		genpd_sd_counter_inc(link->master);
 	}
 
-	genpd_power_on(genpd, false);
+	_genpd_power_on(genpd, false);
 
 	genpd->status = GPD_STATE_ACTIVE;
 }
@@ -894,7 +894,7 @@ static int pm_genpd_suspend_noirq(struct device *dev)
 	 * the same PM domain, so it is not necessary to use locking here.
 	 */
 	genpd->suspended_count++;
-	genpd_sync_poweroff(genpd);
+	genpd_sync_power_off(genpd);
 
 	return 0;
 }
@@ -924,7 +924,7 @@ static int pm_genpd_resume_noirq(struct device *dev)
 	 * guaranteed that this function will never run twice in parallel for
 	 * the same PM domain, so it is not necessary to use locking here.
 	 */
-	genpd_sync_poweron(genpd);
+	genpd_sync_power_on(genpd);
 	genpd->suspended_count--;
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
@@ -1012,12 +1012,12 @@ static int pm_genpd_restore_noirq(struct device *dev)
 	if (genpd->suspended_count++ == 0)
 		/*
 		 * The boot kernel might put the domain into arbitrary state,
-		 * so make it appear as powered off to genpd_sync_poweron(),
+		 * so make it appear as powered off to genpd_sync_power_on(),
 		 * so that it tries to power it on in case it was really off.
 		 */
 		genpd->status = GPD_STATE_POWER_OFF;
 
-	genpd_sync_poweron(genpd);
+	genpd_sync_power_on(genpd);
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
 		ret = pm_runtime_force_resume(dev);
@@ -1072,9 +1072,9 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
 
 	if (suspend) {
 		genpd->suspended_count++;
-		genpd_sync_poweroff(genpd);
+		genpd_sync_power_off(genpd);
 	} else {
-		genpd_sync_poweron(genpd);
+		genpd_sync_power_on(genpd);
 		genpd->suspended_count--;
 	}
 }
@@ -2043,7 +2043,7 @@ int genpd_dev_pm_attach(struct device *dev)
 	dev->pm_domain->sync = genpd_dev_pm_sync;
 
 	genpd_lock(pd);
-	ret = genpd_poweron(pd, 0);
+	ret = genpd_power_on(pd, 0);
 	genpd_unlock(pd);
 out:
 	return ret ? -EPROBE_DEFER : 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks
  2017-01-12 17:17 [RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
  2017-01-12 17:17 ` [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off Ulf Hansson
@ 2017-01-12 17:17 ` Ulf Hansson
  2017-02-07 13:23   ` Geert Uytterhoeven
  1 sibling, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2017-01-12 17:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ulf Hansson, linux-pm
  Cc: Len Brown, Pavel Machek, Kevin Hilman, Geert Uytterhoeven,
	Lina Iyer, Jon Hunter, Marek Szyprowski, Brian Norris

As the PM core may invoke the *noirq() callbacks asynchronously, the
current lock-less approach in genpd doesn't work. The consequence is that
we may find concurrent operations racing to power on/off the PM domain.

As of now, no immediate errors has been reported, but it's probably only a
matter time. Therefor let's fix the problem now before this becomes a real
issue, by deploying the locking scheme to the relevant functions.

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index fd2e3e1..6b23d82 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
 /**
  * genpd_sync_power_off - Synchronously power off a PM domain and its masters.
  * @genpd: PM domain to power off, if possible.
+ * @depth: nesting count for lockdep.
  *
  * Check if the given PM domain can be powered off (during system suspend or
  * hibernation) and do that if so.  Also, in that case propagate to its masters.
  *
  * This function is only called in "noirq" and "syscore" stages of system power
- * transitions, so it need not acquire locks (all of the "noirq" callbacks are
- * executed sequentially, so it is guaranteed that it will never run twice in
- * parallel).
+ * transitions, but since the "noirq" callbacks may be executed asynchronously,
+ * the lock must be held.
  */
-static void genpd_sync_power_off(struct generic_pm_domain *genpd)
+static void genpd_sync_power_off(struct generic_pm_domain *genpd,
+				 unsigned int depth)
 {
 	struct gpd_link *link;
 
@@ -757,20 +758,24 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd)
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
 		genpd_sd_counter_dec(link->master);
-		genpd_sync_power_off(link->master);
+
+		genpd_lock_nested(link->master, depth + 1);
+		genpd_sync_power_off(link->master, depth + 1);
+		genpd_unlock(link->master);
 	}
 }
 
 /**
  * genpd_sync_power_on - Synchronously power on a PM domain and its masters.
  * @genpd: PM domain to power on.
+ * @depth: nesting count for lockdep.
  *
  * This function is only called in "noirq" and "syscore" stages of system power
- * transitions, so it need not acquire locks (all of the "noirq" callbacks are
- * executed sequentially, so it is guaranteed that it will never run twice in
- * parallel).
+ * transitions, but since the "noirq" callbacks may be executed asynchronously,
+ * the lock must be held.
  */
-static void genpd_sync_power_on(struct generic_pm_domain *genpd)
+static void genpd_sync_power_on(struct generic_pm_domain *genpd,
+				unsigned int depth)
 {
 	struct gpd_link *link;
 
@@ -778,8 +783,11 @@ static void genpd_sync_power_on(struct generic_pm_domain *genpd)
 		return;
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
-		genpd_sync_power_on(link->master);
 		genpd_sd_counter_inc(link->master);
+
+		genpd_lock_nested(link->master, depth + 1);
+		genpd_sync_power_on(link->master, depth + 1);
+		genpd_unlock(link->master);
 	}
 
 	_genpd_power_on(genpd, false);
@@ -888,13 +896,10 @@ static int pm_genpd_suspend_noirq(struct device *dev)
 			return ret;
 	}
 
-	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 */
+	genpd_lock(genpd);
 	genpd->suspended_count++;
-	genpd_sync_power_off(genpd);
+	genpd_sync_power_off(genpd, 0);
+	genpd_unlock(genpd);
 
 	return 0;
 }
@@ -919,13 +924,10 @@ static int pm_genpd_resume_noirq(struct device *dev)
 	if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev))
 		return 0;
 
-	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 */
-	genpd_sync_power_on(genpd);
+	genpd_lock(genpd);
+	genpd_sync_power_on(genpd, 0);
 	genpd->suspended_count--;
+	genpd_unlock(genpd);
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
 		ret = pm_runtime_force_resume(dev);
@@ -1002,13 +1004,10 @@ static int pm_genpd_restore_noirq(struct device *dev)
 		return -EINVAL;
 
 	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 *
 	 * At this point suspended_count == 0 means we are being run for the
 	 * first time for the given domain in the present cycle.
 	 */
+	genpd_lock(genpd);
 	if (genpd->suspended_count++ == 0)
 		/*
 		 * The boot kernel might put the domain into arbitrary state,
@@ -1017,7 +1016,8 @@ static int pm_genpd_restore_noirq(struct device *dev)
 		 */
 		genpd->status = GPD_STATE_POWER_OFF;
 
-	genpd_sync_power_on(genpd);
+	genpd_sync_power_on(genpd, 0);
+	genpd_unlock(genpd);
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
 		ret = pm_runtime_force_resume(dev);
@@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
 	if (!pm_genpd_present(genpd))
 		return;
 
+	genpd_lock(genpd);
+
 	if (suspend) {
 		genpd->suspended_count++;
-		genpd_sync_power_off(genpd);
+		genpd_sync_power_off(genpd, 0);
 	} else {
-		genpd_sync_power_on(genpd);
+		genpd_sync_power_on(genpd, 0);
 		genpd->suspended_count--;
 	}
+
+	genpd_unlock(genpd);
 }
 
 void pm_genpd_syscore_poweroff(struct device *dev)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off
  2017-01-12 17:17 ` [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off Ulf Hansson
@ 2017-01-13 19:53   ` Kevin Hilman
  0 siblings, 0 replies; 6+ messages in thread
From: Kevin Hilman @ 2017-01-13 19:53 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Len Brown, Pavel Machek,
	Geert Uytterhoeven, Lina Iyer, Jon Hunter, Marek Szyprowski,
	Brian Norris

Ulf Hansson <ulf.hansson@linaro.org> writes:

> Currently the mix of genpd_poweron(), genpd_power_on(),
> genpd_sync_poweron() and the ->power_on() callback, makes a bit difficult
> to follow the path of execution. The similar applies to the functions
> dealing with power off.
>
> In a way to improve this understanding, let's do the following renaming:
>
> genpd_power_on() ->  _genpd_power_on()
> genpd_poweron() -> genpd_power_on()
> genpd_sync_poweron() -> genpd_sync_power_on()
>
> genpd_power_off() -> _genpd_power_off()
> genpd_poweroff() -> genpd_power_off()
> genpd_sync_poweroff() -> genpd_sync_power_off()
> genpd_poweroff_unused() -> genpd_power_off_unused()
>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

Acked-by: Kevin Hilman <khilman@baylibre.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks
  2017-01-12 17:17 ` [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
@ 2017-02-07 13:23   ` Geert Uytterhoeven
  2017-02-07 15:07     ` Ulf Hansson
  0 siblings, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2017-02-07 13:23 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J. Wysocki
  Cc: Linux PM list, Len Brown, Pavel Machek, Kevin Hilman, Lina Iyer,
	Jon Hunter, Marek Szyprowski, Brian Norris, Linux-Renesas

Hi Ulf, Rafael,

On Thu, Jan 12, 2017 at 6:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> As the PM core may invoke the *noirq() callbacks asynchronously, the
> current lock-less approach in genpd doesn't work. The consequence is that
> we may find concurrent operations racing to power on/off the PM domain.
>
> As of now, no immediate errors has been reported, but it's probably only a
> matter time. Therefor let's fix the problem now before this becomes a real
> issue, by deploying the locking scheme to the relevant functions.
>
> Reported-by: Brian Norris <briannorris@chromium.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

This is commit ef4f7e2c8335a56b in pm/linux-next.

> ---
>  drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------
>  1 file changed, 33 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index fd2e3e1..6b23d82 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,

> @@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
>         if (!pm_genpd_present(genpd))
>                 return;
>
> +       genpd_lock(genpd);
> +
>         if (suspend) {
>                 genpd->suspended_count++;
> -               genpd_sync_power_off(genpd);
> +               genpd_sync_power_off(genpd, 0);
>         } else {
> -               genpd_sync_power_on(genpd);
> +               genpd_sync_power_on(genpd, 0);
>                 genpd->suspended_count--;
>         }
> +
> +       genpd_unlock(genpd);
>  }

This causes the following BUG on all my Renesas arm32 boards, where the
system timer is an IRQ safe device:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232
in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram
CPU: 0 PID: 1751 Comm: s2ram Not tainted
4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354
Hardware name: Generic R8A7791 (Flattened Device Tree)
[<c020e9c4>] (unwind_backtrace) from [<c020a40c>] (show_stack+0x10/0x14)
[<c020a40c>] (show_stack) from [<c03f9818>] (dump_stack+0x7c/0x9c)
[<c03f9818>] (dump_stack) from [<c0240020>] (___might_sleep+0x124/0x160)
[<c0240020>] (___might_sleep) from [<c06fedfc>] (mutex_lock+0x18/0x60)
[<c06fedfc>] (mutex_lock) from [<c04de340>] (genpd_syscore_switch+0x2c/0x7c)
[<c04de340>] (genpd_syscore_switch) from [<c05ec328>]
(sh_cmt_clock_event_suspend+0x18/0x28)
[<c05ec328>] (sh_cmt_clock_event_suspend) from [<c027f9a4>]
(clockevents_suspend+0x40/0x54)
[<c027f9a4>] (clockevents_suspend) from [<c0276d48>]
(timekeeping_suspend+0x23c/0x278)
[<c0276d48>] (timekeeping_suspend) from [<c04cbb88>]
(syscore_suspend+0x88/0x138)
[<c04cbb88>] (syscore_suspend) from [<c025f618>]
(suspend_devices_and_enter+0x290/0x470)
[<c025f618>] (suspend_devices_and_enter) from [<c025fa20>]
(pm_suspend+0x228/0x280)
[<c025fa20>] (pm_suspend) from [<c025e60c>] (state_store+0xac/0xcc)
[<c025e60c>] (state_store) from [<c0340c4c>] (kernfs_fop_write+0x160/0x19c)
[<c0340c4c>] (kernfs_fop_write) from [<c02e3054>] (__vfs_write+0x20/0x108)
[<c02e3054>] (__vfs_write) from [<c02e4424>] (vfs_write+0xb8/0x144)
[<c02e4424>] (vfs_write) from [<c02e5014>] (SyS_write+0x40/0x80)
[<c02e5014>] (SyS_write) from [<c0206cc0>] (ret_fast_syscall+0x0/0x34)

Reverting the commit fixes the issue.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks
  2017-02-07 13:23   ` Geert Uytterhoeven
@ 2017-02-07 15:07     ` Ulf Hansson
  0 siblings, 0 replies; 6+ messages in thread
From: Ulf Hansson @ 2017-02-07 15:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Linux PM list, Len Brown, Pavel Machek,
	Kevin Hilman, Lina Iyer, Jon Hunter, Marek Szyprowski,
	Brian Norris, Linux-Renesas

On 7 February 2017 at 14:23, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Ulf, Rafael,
>
> On Thu, Jan 12, 2017 at 6:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> As the PM core may invoke the *noirq() callbacks asynchronously, the
>> current lock-less approach in genpd doesn't work. The consequence is that
>> we may find concurrent operations racing to power on/off the PM domain.
>>
>> As of now, no immediate errors has been reported, but it's probably only a
>> matter time. Therefor let's fix the problem now before this becomes a real
>> issue, by deploying the locking scheme to the relevant functions.
>>
>> Reported-by: Brian Norris <briannorris@chromium.org>
>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> This is commit ef4f7e2c8335a56b in pm/linux-next.
>
>> ---
>>  drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------
>>  1 file changed, 33 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index fd2e3e1..6b23d82 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
>
>> @@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)
>>         if (!pm_genpd_present(genpd))
>>                 return;
>>
>> +       genpd_lock(genpd);
>> +
>>         if (suspend) {
>>                 genpd->suspended_count++;
>> -               genpd_sync_power_off(genpd);
>> +               genpd_sync_power_off(genpd, 0);
>>         } else {
>> -               genpd_sync_power_on(genpd);
>> +               genpd_sync_power_on(genpd, 0);
>>                 genpd->suspended_count--;
>>         }
>> +
>> +       genpd_unlock(genpd);
>>  }
>
> This causes the following BUG on all my Renesas arm32 boards, where the
> system timer is an IRQ safe device:
>
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232
> in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram
> CPU: 0 PID: 1751 Comm: s2ram Not tainted
> 4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354
> Hardware name: Generic R8A7791 (Flattened Device Tree)
> [<c020e9c4>] (unwind_backtrace) from [<c020a40c>] (show_stack+0x10/0x14)
> [<c020a40c>] (show_stack) from [<c03f9818>] (dump_stack+0x7c/0x9c)
> [<c03f9818>] (dump_stack) from [<c0240020>] (___might_sleep+0x124/0x160)
> [<c0240020>] (___might_sleep) from [<c06fedfc>] (mutex_lock+0x18/0x60)
> [<c06fedfc>] (mutex_lock) from [<c04de340>] (genpd_syscore_switch+0x2c/0x7c)
> [<c04de340>] (genpd_syscore_switch) from [<c05ec328>]
> (sh_cmt_clock_event_suspend+0x18/0x28)
> [<c05ec328>] (sh_cmt_clock_event_suspend) from [<c027f9a4>]
> (clockevents_suspend+0x40/0x54)
> [<c027f9a4>] (clockevents_suspend) from [<c0276d48>]
> (timekeeping_suspend+0x23c/0x278)
> [<c0276d48>] (timekeeping_suspend) from [<c04cbb88>]
> (syscore_suspend+0x88/0x138)
> [<c04cbb88>] (syscore_suspend) from [<c025f618>]
> (suspend_devices_and_enter+0x290/0x470)
> [<c025f618>] (suspend_devices_and_enter) from [<c025fa20>]
> (pm_suspend+0x228/0x280)
> [<c025fa20>] (pm_suspend) from [<c025e60c>] (state_store+0xac/0xcc)
> [<c025e60c>] (state_store) from [<c0340c4c>] (kernfs_fop_write+0x160/0x19c)
> [<c0340c4c>] (kernfs_fop_write) from [<c02e3054>] (__vfs_write+0x20/0x108)
> [<c02e3054>] (__vfs_write) from [<c02e4424>] (vfs_write+0xb8/0x144)
> [<c02e4424>] (vfs_write) from [<c02e5014>] (SyS_write+0x40/0x80)
> [<c02e5014>] (SyS_write) from [<c0206cc0>] (ret_fast_syscall+0x0/0x34)
>
> Reverting the commit fixes the issue.

Geert, thanks for reporting! Instead of a revert, I believe a better
option is to only partly revert the change.

What I suggest is to keep the locks for the *noirq() callbacks, but
remove it for the pm_genpd_syscore* functions.

Actually the change-log I wrote for the related commit, don't mention
the changes it introduces to the syscore API - my bad! I apologize for
the inconvenience.

Allow me to send a fix on top asap. Whether Rafael can fold it into
the existing commit or add on top, I leave to him to decide.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-07 15:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-12 17:17 [RESEND PATCH 0/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
2017-01-12 17:17 ` [RESEND PATCH 1/2] PM / Domains: Rename functions in genpd for power on/off Ulf Hansson
2017-01-13 19:53   ` Kevin Hilman
2017-01-12 17:17 ` [RESEND PATCH 2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks Ulf Hansson
2017-02-07 13:23   ` Geert Uytterhoeven
2017-02-07 15:07     ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).