[BUG] mutex deadlock of dpm_resume() in low memory situation

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] mutex deadlock of dpm_resume() in low memory situation
       [not found] <CGME20231227084252epcas2p3b063f7852f81f82cd0a31afd7f404db4@epcas2p3.samsung.com>
@ 2023-12-27  8:42 ` Youngmin Nam
  2023-12-27 16:08   ` Greg KH
  0 siblings, 1 reply; 25+ messages in thread
From: Youngmin Nam @ 2023-12-27  8:42 UTC (permalink / raw)
  To: rafael, len.brown, pavel, gregkh
  Cc: linux-pm, linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu,
	youngmin.nam

[-- Attachment #1: Type: text/plain, Size: 6085 bytes --]

Hi all.

I'm reporting a issue which looks like a upstream kernel bug.
We are using 6.1 but I think all of version can cause this issue.

A mutex deadlock issue occured on low mem situation when the device runs dpm_resume().
Here's the problematic situation.

#1. Currently, the device is on low mem situation as below.
[4: binder:569_5:27109] SLUB: Unable to allocate memory on node -1, gfp=0xb20(GFP_ATOMIC|__GFP_ZERO)
[4: binder:569_5:27109] cache: kmalloc-128, object size: 128, buffer size: 128, default order: 0, min order: 0
[4: binder:569_5:27109] node 0: slabs: 1865, objs: 59680, free: 0
[4: binder:569_5:27109] binder:569_5: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
[4: binder:569_5:27109] CPU: 4 PID: 27109 Comm: binder:569_5 Tainted: G S C E 6.1.43-android14-11-abS921BXXU1AWLB #1
[4: binder:569_5:27109] Hardware name: Samsung E1S EUR OPENX board based on S5E9945 (DT)
[4: binder:569_5:27109] Call trace:
[4: binder:569_5:27109] dump_backtrace+0xf4/0x118
[4: binder:569_5:27109] show_stack+0x18/0x24
[4: binder:569_5:27109] dump_stack_lvl+0x60/0x7c
[4: binder:569_5:27109] dump_stack+0x18/0x38
[4: binder:569_5:27109] warn_alloc+0xf4/0x190
[4: binder:569_5:27109] __alloc_pages_slowpath+0x10ec/0x12ac
[4: binder:569_5:27109] __alloc_pages+0x27c/0x2fc
[4: binder:569_5:27109] new_slab+0x17c/0x4e0
[4: binder:569_5:27109] ___slab_alloc+0x4e4/0x8a8
[4: binder:569_5:27109] __slab_alloc+0x34/0x6c
[4: binder:569_5:27109] __kmem_cache_alloc_node+0x1f4/0x260
[4: binder:569_5:27109] kmalloc_trace+0x4c/0x144
[4: binder:569_5:27109] async_schedule_node_domain+0x40/0x1ec
[4: binder:569_5:27109] async_schedule_node+0x18/0x28
[4: binder:569_5:27109] dpm_suspend+0xfc/0x48c

#2. The process A runs dpm_resume() and acquired "dpm_list_mtx" lock as below.
1000 void dpm_resume(pm_message_t state)
1001 {
1002         struct device *dev;
1003         ktime_t starttime = ktime_get();
1004
1005         trace_suspend_resume(TPS("dpm_resume"), state.event, true);
1006         might_sleep();
1007
1008         mutex_lock(&dpm_list_mtx);  <-------- process A acquired the lock
1009         pm_transition = state;
1010         async_error = 0;
1011
1012         list_for_each_entry(dev, &dpm_suspended_list, power.entry)
1013                 dpm_async_fn(dev, async_resume);

#3. The process A continues to run below functions as below.
dpm_async_fn()
      --> async_schedule_dev()
          --> async_schedule_node()
              --> async_schedule_node_domain()

#4. The kzalloc() will be failed because of lowmen situation.
165 async_cookie_t async_schedule_node_domain(async_func_t func, void *data,
166                                           int node, struct async_domain *domain)
167 {
168         struct async_entry *entry;
169         unsigned long flags;
170         async_cookie_t newcookie;
171
172         /* allow irq-off callers */
173         entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); <--- Will be failied
174
175         /*
176          * If we're out of memory or if there's too much work
177          * pending already, we execute synchronously.
178          */
179         if (!entry || atomic_read(&entry_count) > MAX_WORK) {
180                 kfree(entry);
181                 spin_lock_irqsave(&async_lock, flags);
182                 newcookie = next_cookie++;
183                 spin_unlock_irqrestore(&async_lock, flags);
184
185                 /* low on memory.. run synchronously */
186                 func(data, newcookie); <--- the process A will run func() that is async_resume()
187                 return newcookie;

5. The process A continues to run below functions as below.
async_resume()
    --> device_resume()
        --> dpm_wait_for_superior()

6. The process A is trying to get the lock which was acquired by A.
278 static bool dpm_wait_for_superior(struct device *dev, bool async)
279 {
280         struct device *parent;
281
282         /*
283          * If the device is resumed asynchronously and the parent's callback
284          * deletes both the device and the parent itself, the parent object may
285          * be freed while this function is running, so avoid that by reference
286          * counting the parent once more unless the device has been deleted
287          * already (in which case return right away).
288          */
289         mutex_lock(&dpm_list_mtx);

So we think this situation can make mutex dead lock issue in suspend/resume sequence.

Here's the process A's callstack in kernel log. (binder:569:5)
I[4:      swapper/4:    0]      pid            uTime            sTime     last_arrival      last_queued   stat   cpu  task_struct           comm [wait channel]
I[4:      swapper/4:    0]    27109                0         1758074533    2396019774802             0    D(  2)  5   ffffff8044bc92c0      binder:569_5 [dpm_wait_for_superior]
I[4:      swapper/4:    0] Mutex: dpm_list_mtx+0x0/0x30: owner[0xffffff8044bc92c0 binder:569_5 :27109]
I[4:      swapper/4:    0] Call trace:
I[4:      swapper/4:    0]  __switch_to+0x174/0x338
I[4:      swapper/4:    0]  __schedule+0x5ec/0x9cc
I[4:      swapper/4:    0]  schedule+0x7c/0xe8
I[4:      swapper/4:    0]  schedule_preempt_disabled+0x24/0x40
I[4:      swapper/4:    0]  __mutex_lock+0x408/0xdac
I[4:      swapper/4:    0]  __mutex_lock_slowpath+0x14/0x24
I[4:      swapper/4:    0]  mutex_lock+0x40/0xec  <-------- trying to acquire dpm_list_mtx again
I[4:      swapper/4:    0]  dpm_wait_for_superior+0x30/0x148
I[4:      swapper/4:    0]  device_resume+0x38/0x1e4
I[4:      swapper/4:    0]  async_resume+0x24/0xf4
I[4:      swapper/4:    0]  async_schedule_node_domain+0xb0/0x1ec
I[4:      swapper/4:    0]  async_schedule_node+0x18/0x28
I[4:      swapper/4:    0]  dpm_resume+0xbc/0x578  <------- acquired dpm_list_mtx
I[4:      swapper/4:    0]  dpm_resume_end+0x1c/0x38
I[4:      swapper/4:    0]  suspend_devices_and_enter+0x83c/0xb2c
I[4:      swapper/4:    0]  pm_suspend+0x34c/0x618
I[4:      swapper/4:    0]  state_store+0x104/0x144

Could you look into this issue ?

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27  8:42 ` [BUG] mutex deadlock of dpm_resume() in low memory situation Youngmin Nam
@ 2023-12-27 16:08   ` Greg KH
  2023-12-27 17:44     ` Rafael J. Wysocki
  2023-12-27 18:39     ` Rafael J. Wysocki
  0 siblings, 2 replies; 25+ messages in thread
From: Greg KH @ 2023-12-27 16:08 UTC (permalink / raw)
  To: Youngmin Nam
  Cc: rafael, len.brown, pavel, linux-pm, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu

On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> Could you look into this issue ?

Can you submit a patch that resolves the issue for you, as you have a
way to actually test this out?  That would be the quickest way to get it
resolved, and to help confirm that this is even an issue at all.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27 16:08   ` Greg KH
@ 2023-12-27 17:44     ` Rafael J. Wysocki
  2023-12-27 18:39     ` Rafael J. Wysocki
  1 sibling, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 17:44 UTC (permalink / raw)
  To: Greg KH
  Cc: Youngmin Nam, rafael, len.brown, pavel, linux-pm, linux-kernel,
	d7271.choe, janghyuck.kim, hyesoo.yu

On Wed, Dec 27, 2023 at 5:08 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> > Could you look into this issue ?
>
> Can you submit a patch that resolves the issue for you, as you have a
> way to actually test this out?  That would be the quickest way to get it
> resolved, and to help confirm that this is even an issue at all.

This is a real problem, unfortunately, and the fix would require some
infra changes AFAICS.

To address it, we would need a variant of async_schedule_node_domain()
that would bail out on low memory instead of attempting to run the
stuff synchronously which is harmful (not just for the deadlock
reason) in the suspend-resume paths.

I'll try to cut a test patch shortly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27 16:08   ` Greg KH
  2023-12-27 17:44     ` Rafael J. Wysocki
@ 2023-12-27 18:39     ` Rafael J. Wysocki
  2023-12-27 18:58       ` Rafael J. Wysocki
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  1 sibling, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 18:39 UTC (permalink / raw)
  To: Youngmin Nam, Greg KH
  Cc: rafael, len.brown, pavel, linux-pm, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu

On Wednesday, December 27, 2023 5:08:40 PM CET Greg KH wrote:
> On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> > Could you look into this issue ?
> 
> Can you submit a patch that resolves the issue for you, as you have a
> way to actually test this out?  That would be the quickest way to get it
> resolved, and to help confirm that this is even an issue at all.

Something like the appended patch should be sufficient to address this AFAICS.

I haven't tested it yet (will do so shortly), so all of the usual disclaimers
apply.

I think that it can be split into 2 patches, but for easier testing here
it goes in one piece.

Fixes: f2a424f6c613 ("PM / core: Introduce dpm_async_fn() helper")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/main.c |   12 ++++--
 include/linux/async.h     |    2 +
 kernel/async.c            |   85 ++++++++++++++++++++++++++++++++++------------
 3 files changed, 73 insertions(+), 26 deletions(-)

Index: linux-pm/kernel/async.c
===================================================================
--- linux-pm.orig/kernel/async.c
+++ linux-pm/kernel/async.c
@@ -145,6 +145,39 @@ static void async_run_entry_fn(struct wo
 	wake_up(&async_done);
 }
 
+static async_cookie_t __async_schedule_node_domain(async_func_t func,
+						   void *data, int node,
+						   struct async_domain *domain,
+						   struct async_entry *entry)
+{
+	async_cookie_t newcookie;
+	unsigned long flags;
+
+	INIT_LIST_HEAD(&entry->domain_list);
+	INIT_LIST_HEAD(&entry->global_list);
+	INIT_WORK(&entry->work, async_run_entry_fn);
+	entry->func = func;
+	entry->data = data;
+	entry->domain = domain;
+
+	spin_lock_irqsave(&async_lock, flags);
+
+	/* allocate cookie and queue */
+	newcookie = entry->cookie = next_cookie++;
+
+	list_add_tail(&entry->domain_list, &domain->pending);
+	if (domain->registered)
+		list_add_tail(&entry->global_list, &async_global_pending);
+
+	atomic_inc(&entry_count);
+	spin_unlock_irqrestore(&async_lock, flags);
+
+	/* schedule for execution */
+	queue_work_node(node, system_unbound_wq, &entry->work);
+
+	return newcookie;
+}
+
 /**
  * async_schedule_node_domain - NUMA specific version of async_schedule_domain
  * @func: function to execute asynchronously
@@ -186,29 +219,8 @@ async_cookie_t async_schedule_node_domai
 		func(data, newcookie);
 		return newcookie;
 	}
-	INIT_LIST_HEAD(&entry->domain_list);
-	INIT_LIST_HEAD(&entry->global_list);
-	INIT_WORK(&entry->work, async_run_entry_fn);
-	entry->func = func;
-	entry->data = data;
-	entry->domain = domain;
-
-	spin_lock_irqsave(&async_lock, flags);
-
-	/* allocate cookie and queue */
-	newcookie = entry->cookie = next_cookie++;
-
-	list_add_tail(&entry->domain_list, &domain->pending);
-	if (domain->registered)
-		list_add_tail(&entry->global_list, &async_global_pending);
-
-	atomic_inc(&entry_count);
-	spin_unlock_irqrestore(&async_lock, flags);
-
-	/* schedule for execution */
-	queue_work_node(node, system_unbound_wq, &entry->work);
 
-	return newcookie;
+	return __async_schedule_node_domain(func, data, node, domain, entry);
 }
 EXPORT_SYMBOL_GPL(async_schedule_node_domain);
 
@@ -232,6 +244,35 @@ async_cookie_t async_schedule_node(async
 EXPORT_SYMBOL_GPL(async_schedule_node);
 
 /**
+ * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
+ * @func: function to execute asynchronously
+ * @dev: device argument to be passed to function
+ *
+ * @dev is used as both the argument for the function and to provide NUMA
+ * context for where to run the function.
+ *
+ * If the asynchronous execution of @func is scheduled successfully, return
+ * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
+ * that will run the function synchronously then.
+ */
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
+{
+	struct async_entry *entry;
+
+	entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
+
+	/* Give up if there is no memory or too much work. */
+	if (!entry || atomic_read(&entry_count) > MAX_WORK) {
+		kfree(entry);
+		return false;
+	}
+
+	__async_schedule_node_domain(func, dev, dev_to_node(dev),
+				     &async_dfl_domain, entry);
+	return true;
+}
+
+/**
  * async_synchronize_full - synchronize all asynchronous function calls
  *
  * This function waits until all asynchronous function calls have been done.
Index: linux-pm/include/linux/async.h
===================================================================
--- linux-pm.orig/include/linux/async.h
+++ linux-pm/include/linux/async.h
@@ -90,6 +90,8 @@ async_schedule_dev(async_func_t func, st
 	return async_schedule_node(func, dev, dev_to_node(dev));
 }
 
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev);
+
 /**
  * async_schedule_dev_domain - A device specific version of async_schedule_domain
  * @func: function to execute asynchronously
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -668,11 +668,15 @@ static bool dpm_async_fn(struct device *
 {
 	reinit_completion(&dev->power.completion);
 
-	if (is_async(dev)) {
-		get_device(dev);
-		async_schedule_dev(func, dev);
+	if (!is_async(dev))
+		return false;
+
+	get_device(dev);
+
+	if (async_schedule_dev_nocall(func, dev))
 		return true;
-	}
+
+	put_device(dev);
 
 	return false;
 }




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27 18:39     ` Rafael J. Wysocki
@ 2023-12-27 18:58       ` Rafael J. Wysocki
  2023-12-27 20:50         ` Rafael J. Wysocki
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 18:58 UTC (permalink / raw)
  To: Youngmin Nam, Greg KH
  Cc: rafael, len.brown, pavel, linux-pm, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu

On Wednesday, December 27, 2023 7:39:20 PM CET Rafael J. Wysocki wrote:
> On Wednesday, December 27, 2023 5:08:40 PM CET Greg KH wrote:
> > On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> > > Could you look into this issue ?
> > 
> > Can you submit a patch that resolves the issue for you, as you have a
> > way to actually test this out?  That would be the quickest way to get it
> > resolved, and to help confirm that this is even an issue at all.
> 
> Something like the appended patch should be sufficient to address this AFAICS.
> 
> I haven't tested it yet (will do so shortly), so all of the usual disclaimers
> apply.
> 
> I think that it can be split into 2 patches, but for easier testing here
> it goes in one piece.

Well, please scratch this, it will not handle "async" devices properly.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks
  2023-12-27 18:39     ` Rafael J. Wysocki
  2023-12-27 18:58       ` Rafael J. Wysocki
@ 2023-12-27 20:35       ` Rafael J. Wysocki
  2023-12-27 20:37         ` [PATCH v1 1/3] async: Split async_schedule_node_domain() Rafael J. Wysocki
                           ` (3 more replies)
  1 sibling, 4 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 20:35 UTC (permalink / raw)
  To: Greg KH, linux-pm
  Cc: Youngmin Nam, rafael, linux-kernel, d7271.choe, janghyuck.kim,
	hyesoo.yu, Alan Stern, Ulf Hansson

Hi Everyone,

As reported here

https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/

the device suspend-resume code running during system-wide PM transitions
deadlock on low memory, because it attempts to acquire a mutex that's
already held by it in those cases.

This series addresses the issue by changing the resume code behavior
to directly run the device PM functions synchronously if they cannot
be scheduled for asynchronous executions (patch [3/3]).

For this purpose, the async code is rearranged (patch [1/3]) and a
new variant of async_schedule_dev() is introduced (patch [2/3]).

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 1/3] async: Split async_schedule_node_domain()
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
@ 2023-12-27 20:37         ` Rafael J. Wysocki
  2023-12-27 20:38         ` [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall() Rafael J. Wysocki
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 20:37 UTC (permalink / raw)
  To: Greg KH, linux-pm
  Cc: Youngmin Nam, rafael, linux-kernel, d7271.choe, janghyuck.kim,
	hyesoo.yu, Alan Stern, Ulf Hansson

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In preparation for subsequent changes, split async_schedule_node_domain()
in two pieces so as to allow the bottom part of it to be called from a
somewhat different code path.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/async.c |   56 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 22 deletions(-)

Index: linux-pm/kernel/async.c
===================================================================
--- linux-pm.orig/kernel/async.c
+++ linux-pm/kernel/async.c
@@ -145,6 +145,39 @@ static void async_run_entry_fn(struct wo
 	wake_up(&async_done);
 }
 
+static async_cookie_t __async_schedule_node_domain(async_func_t func,
+						   void *data, int node,
+						   struct async_domain *domain,
+						   struct async_entry *entry)
+{
+	async_cookie_t newcookie;
+	unsigned long flags;
+
+	INIT_LIST_HEAD(&entry->domain_list);
+	INIT_LIST_HEAD(&entry->global_list);
+	INIT_WORK(&entry->work, async_run_entry_fn);
+	entry->func = func;
+	entry->data = data;
+	entry->domain = domain;
+
+	spin_lock_irqsave(&async_lock, flags);
+
+	/* allocate cookie and queue */
+	newcookie = entry->cookie = next_cookie++;
+
+	list_add_tail(&entry->domain_list, &domain->pending);
+	if (domain->registered)
+		list_add_tail(&entry->global_list, &async_global_pending);
+
+	atomic_inc(&entry_count);
+	spin_unlock_irqrestore(&async_lock, flags);
+
+	/* schedule for execution */
+	queue_work_node(node, system_unbound_wq, &entry->work);
+
+	return newcookie;
+}
+
 /**
  * async_schedule_node_domain - NUMA specific version of async_schedule_domain
  * @func: function to execute asynchronously
@@ -186,29 +219,8 @@ async_cookie_t async_schedule_node_domai
 		func(data, newcookie);
 		return newcookie;
 	}
-	INIT_LIST_HEAD(&entry->domain_list);
-	INIT_LIST_HEAD(&entry->global_list);
-	INIT_WORK(&entry->work, async_run_entry_fn);
-	entry->func = func;
-	entry->data = data;
-	entry->domain = domain;
-
-	spin_lock_irqsave(&async_lock, flags);
 
-	/* allocate cookie and queue */
-	newcookie = entry->cookie = next_cookie++;
-
-	list_add_tail(&entry->domain_list, &domain->pending);
-	if (domain->registered)
-		list_add_tail(&entry->global_list, &async_global_pending);
-
-	atomic_inc(&entry_count);
-	spin_unlock_irqrestore(&async_lock, flags);
-
-	/* schedule for execution */
-	queue_work_node(node, system_unbound_wq, &entry->work);
-
-	return newcookie;
+	return __async_schedule_node_domain(func, data, node, domain, entry);
 }
 EXPORT_SYMBOL_GPL(async_schedule_node_domain);
 




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  2023-12-27 20:37         ` [PATCH v1 1/3] async: Split async_schedule_node_domain() Rafael J. Wysocki
@ 2023-12-27 20:38         ` Rafael J. Wysocki
  2023-12-28 20:29           ` Stanislaw Gruszka
  2023-12-27 20:41         ` [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code Rafael J. Wysocki
  2024-01-02 13:18         ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  3 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 20:38 UTC (permalink / raw)
  To: Greg KH, linux-pm
  Cc: Youngmin Nam, rafael, linux-kernel, d7271.choe, janghyuck.kim,
	hyesoo.yu, Alan Stern, Ulf Hansson

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

In preparation for subsequent changes, introduce a specialized variant
of async_schedule_dev() that will not invoke the argument function
synchronously when it cannot be scheduled for asynchronous execution.

The new function, async_schedule_dev_nocall(), will be used for fixing
possible deadlocks in the system-wide power management core code.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/main.c |   12 ++++++++----
 include/linux/async.h     |    2 ++
 kernel/async.c            |   29 +++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 4 deletions(-)

Index: linux-pm/kernel/async.c
===================================================================
--- linux-pm.orig/kernel/async.c
+++ linux-pm/kernel/async.c
@@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
 EXPORT_SYMBOL_GPL(async_schedule_node);
 
 /**
+ * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
+ * @func: function to execute asynchronously
+ * @dev: device argument to be passed to function
+ *
+ * @dev is used as both the argument for the function and to provide NUMA
+ * context for where to run the function.
+ *
+ * If the asynchronous execution of @func is scheduled successfully, return
+ * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
+ * that will run the function synchronously then.
+ */
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
+{
+	struct async_entry *entry;
+
+	entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
+
+	/* Give up if there is no memory or too much work. */
+	if (!entry || atomic_read(&entry_count) > MAX_WORK) {
+		kfree(entry);
+		return false;
+	}
+
+	__async_schedule_node_domain(func, dev, dev_to_node(dev),
+				     &async_dfl_domain, entry);
+	return true;
+}
+
+/**
  * async_synchronize_full - synchronize all asynchronous function calls
  *
  * This function waits until all asynchronous function calls have been done.
Index: linux-pm/include/linux/async.h
===================================================================
--- linux-pm.orig/include/linux/async.h
+++ linux-pm/include/linux/async.h
@@ -90,6 +90,8 @@ async_schedule_dev(async_func_t func, st
 	return async_schedule_node(func, dev, dev_to_node(dev));
 }
 
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev);
+
 /**
  * async_schedule_dev_domain - A device specific version of async_schedule_domain
  * @func: function to execute asynchronously




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  2023-12-27 20:37         ` [PATCH v1 1/3] async: Split async_schedule_node_domain() Rafael J. Wysocki
  2023-12-27 20:38         ` [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall() Rafael J. Wysocki
@ 2023-12-27 20:41         ` Rafael J. Wysocki
  2024-01-02 13:35           ` Ulf Hansson
  2024-01-02 13:18         ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
  3 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 20:41 UTC (permalink / raw)
  To: Greg KH, linux-pm
  Cc: Youngmin Nam, rafael, linux-kernel, d7271.choe, janghyuck.kim,
	hyesoo.yu, Alan Stern, Ulf Hansson

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It is reported that in low-memory situations the system-wide resume core
code deadlocks, because async_schedule_dev() executes its argument
function synchronously if it cannot allocate memory (an not only then)
and that function attempts to acquire a mutex that is already held.

Address this by changing the code in question to use
async_schedule_dev_nocall() for scheduling the asynchronous
execution of device suspend and resume functions and to directly
run them synchronously if async_schedule_dev_nocall() returns false.

Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

The commit pointed to by the Fixes: tag is the last one that modified
the code in question, even though the bug had been there already before.

Still, the fix will not apply to the code before that commit.

---
 drivers/base/power/main.c |  148 +++++++++++++++++++++-------------------------
 1 file changed, 68 insertions(+), 80 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d
 }
 
 /**
- * device_resume_noirq - Execute a "noirq resume" callback for given device.
+ * __device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  * @async: If true, the device is being resumed asynchronously.
@@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d
  * The driver of @dev will not receive interrupts while this function is being
  * executed.
  */
-static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
+static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -655,7 +655,13 @@ Skip:
 Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
-	return error;
+
+	if (error) {
+		suspend_stats.failed_resume_noirq++;
+		dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
+		dpm_save_failed_dev(dev_name(dev));
+		pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
+	}
 }
 
 static bool is_async(struct device *dev)
@@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device *
 {
 	reinit_completion(&dev->power.completion);
 
-	if (is_async(dev)) {
-		get_device(dev);
-		async_schedule_dev(func, dev);
+	if (!is_async(dev))
+		return false;
+
+	get_device(dev);
+
+	if (async_schedule_dev_nocall(func, dev))
 		return true;
-	}
+
+	put_device(dev);
 
 	return false;
 }
@@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device *
 static void async_resume_noirq(void *data, async_cookie_t cookie)
 {
 	struct device *dev = data;
-	int error;
-
-	error = device_resume_noirq(dev, pm_transition, true);
-	if (error)
-		pm_dev_err(dev, pm_transition, " async", error);
 
+	__device_resume_noirq(dev, pm_transition, true);
 	put_device(dev);
 }
 
+static void device_resume_noirq(struct device *dev)
+{
+	if (dpm_async_fn(dev, async_resume_noirq))
+		return;
+
+	__device_resume_noirq(dev, pm_transition, false);
+}
+
 static void dpm_noirq_resume_devices(pm_message_t state)
 {
 	struct device *dev;
@@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
 
-	/*
-	 * Advanced the async threads upfront,
-	 * in case the starting of async threads is
-	 * delayed by non-async resuming devices.
-	 */
-	list_for_each_entry(dev, &dpm_noirq_list, power.entry)
-		dpm_async_fn(dev, async_resume_noirq);
-
 	while (!list_empty(&dpm_noirq_list)) {
 		dev = to_device(dpm_noirq_list.next);
 		get_device(dev);
@@ -713,17 +719,7 @@ static void dpm_noirq_resume_devices(pm_
 
 		mutex_unlock(&dpm_list_mtx);
 
-		if (!is_async(dev)) {
-			int error;
-
-			error = device_resume_noirq(dev, state, false);
-			if (error) {
-				suspend_stats.failed_resume_noirq++;
-				dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
-				dpm_save_failed_dev(dev_name(dev));
-				pm_dev_err(dev, state, " noirq", error);
-			}
-		}
+		device_resume_noirq(dev);
 
 		put_device(dev);
 
@@ -751,14 +747,14 @@ void dpm_resume_noirq(pm_message_t state
 }
 
 /**
- * device_resume_early - Execute an "early resume" callback for given device.
+ * __device_resume_early - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  * @async: If true, the device is being resumed asynchronously.
  *
  * Runtime PM is disabled for @dev while this function is being executed.
  */
-static int device_resume_early(struct device *dev, pm_message_t state, bool async)
+static void __device_resume_early(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -811,21 +807,31 @@ Out:
 
 	pm_runtime_enable(dev);
 	complete_all(&dev->power.completion);
-	return error;
+
+	if (error) {
+		suspend_stats.failed_resume_early++;
+		dpm_save_failed_step(SUSPEND_RESUME_EARLY);
+		dpm_save_failed_dev(dev_name(dev));
+		pm_dev_err(dev, state, async ? " async early" : " early", error);
+	}
 }
 
 static void async_resume_early(void *data, async_cookie_t cookie)
 {
 	struct device *dev = data;
-	int error;
-
-	error = device_resume_early(dev, pm_transition, true);
-	if (error)
-		pm_dev_err(dev, pm_transition, " async", error);
 
+	__device_resume_early(dev, pm_transition, true);
 	put_device(dev);
 }
 
+static void device_resume_early(struct device *dev)
+{
+	if (dpm_async_fn(dev, async_resume_early))
+		return;
+
+	__device_resume_early(dev, pm_transition, false);
+}
+
 /**
  * dpm_resume_early - Execute "early resume" callbacks for all devices.
  * @state: PM transition of the system being carried out.
@@ -839,14 +845,6 @@ void dpm_resume_early(pm_message_t state
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
 
-	/*
-	 * Advanced the async threads upfront,
-	 * in case the starting of async threads is
-	 * delayed by non-async resuming devices.
-	 */
-	list_for_each_entry(dev, &dpm_late_early_list, power.entry)
-		dpm_async_fn(dev, async_resume_early);
-
 	while (!list_empty(&dpm_late_early_list)) {
 		dev = to_device(dpm_late_early_list.next);
 		get_device(dev);
@@ -854,17 +852,7 @@ void dpm_resume_early(pm_message_t state
 
 		mutex_unlock(&dpm_list_mtx);
 
-		if (!is_async(dev)) {
-			int error;
-
-			error = device_resume_early(dev, state, false);
-			if (error) {
-				suspend_stats.failed_resume_early++;
-				dpm_save_failed_step(SUSPEND_RESUME_EARLY);
-				dpm_save_failed_dev(dev_name(dev));
-				pm_dev_err(dev, state, " early", error);
-			}
-		}
+		device_resume_early(dev);
 
 		put_device(dev);
 
@@ -888,12 +876,12 @@ void dpm_resume_start(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_start);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  * @async: If true, the device is being resumed asynchronously.
  */
-static int device_resume(struct device *dev, pm_message_t state, bool async)
+static void __device_resume(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -975,20 +963,30 @@ static int device_resume(struct device *
 
 	TRACE_RESUME(error);
 
-	return error;
+	if (error) {
+		suspend_stats.failed_resume++;
+		dpm_save_failed_step(SUSPEND_RESUME);
+		dpm_save_failed_dev(dev_name(dev));
+		pm_dev_err(dev, state, async ? " async" : "", error);
+	}
 }
 
 static void async_resume(void *data, async_cookie_t cookie)
 {
 	struct device *dev = data;
-	int error;
 
-	error = device_resume(dev, pm_transition, true);
-	if (error)
-		pm_dev_err(dev, pm_transition, " async", error);
+	__device_resume(dev, pm_transition, true);
 	put_device(dev);
 }
 
+static void device_resume(struct device *dev)
+{
+	if (dpm_async_fn(dev, async_resume))
+		return;
+
+	__device_resume(dev, pm_transition, false);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -1008,27 +1006,17 @@ void dpm_resume(pm_message_t state)
 	pm_transition = state;
 	async_error = 0;
 
-	list_for_each_entry(dev, &dpm_suspended_list, power.entry)
-		dpm_async_fn(dev, async_resume);
-
 	while (!list_empty(&dpm_suspended_list)) {
 		dev = to_device(dpm_suspended_list.next);
+
 		get_device(dev);
-		if (!is_async(dev)) {
-			int error;
 
-			mutex_unlock(&dpm_list_mtx);
+		mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state, false);
-			if (error) {
-				suspend_stats.failed_resume++;
-				dpm_save_failed_step(SUSPEND_RESUME);
-				dpm_save_failed_dev(dev_name(dev));
-				pm_dev_err(dev, state, "", error);
-			}
+		device_resume(dev);
+
+		mutex_lock(&dpm_list_mtx);
 
-			mutex_lock(&dpm_list_mtx);
-		}
 		if (!list_empty(&dev->power.entry))
 			list_move_tail(&dev->power.entry, &dpm_prepared_list);
 




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27 18:58       ` Rafael J. Wysocki
@ 2023-12-27 20:50         ` Rafael J. Wysocki
  2023-12-28  6:40           ` Youngmin Nam
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 20:50 UTC (permalink / raw)
  To: Greg KH, Youngmin Nam
  Cc: len.brown, pavel, linux-pm, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, Rafael J. Wysocki

On Wed, Dec 27, 2023 at 7:58 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Wednesday, December 27, 2023 7:39:20 PM CET Rafael J. Wysocki wrote:
> > On Wednesday, December 27, 2023 5:08:40 PM CET Greg KH wrote:
> > > On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> > > > Could you look into this issue ?
> > >
> > > Can you submit a patch that resolves the issue for you, as you have a
> > > way to actually test this out?  That would be the quickest way to get it
> > > resolved, and to help confirm that this is even an issue at all.
> >
> > Something like the appended patch should be sufficient to address this AFAICS.
> >
> > I haven't tested it yet (will do so shortly), so all of the usual disclaimers
> > apply.
> >
> > I think that it can be split into 2 patches, but for easier testing here
> > it goes in one piece.
>
> Well, please scratch this, it will not handle "async" devices properly.

This

https://lore.kernel.org/linux-pm/6019796.lOV4Wx5bFT@kreacher/

should address the issue properly (it has been lightly tested).

Please give it a go and let me know if it works for you (on top of 6.7-rc7).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [BUG] mutex deadlock of dpm_resume() in low memory situation
  2023-12-27 20:50         ` Rafael J. Wysocki
@ 2023-12-28  6:40           ` Youngmin Nam
  0 siblings, 0 replies; 25+ messages in thread
From: Youngmin Nam @ 2023-12-28  6:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: len.brown, pavel, linux-pm, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, Rafael J. Wysocki, Greg KH,
	Youngmin Nam, hs.gil

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On Wed, Dec 27, 2023 at 09:50:01PM +0100, Rafael J. Wysocki wrote:
> On Wed, Dec 27, 2023 at 7:58 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Wednesday, December 27, 2023 7:39:20 PM CET Rafael J. Wysocki wrote:
> > > On Wednesday, December 27, 2023 5:08:40 PM CET Greg KH wrote:
> > > > On Wed, Dec 27, 2023 at 05:42:50PM +0900, Youngmin Nam wrote:
> > > > > Could you look into this issue ?
> > > >
> > > > Can you submit a patch that resolves the issue for you, as you have a
> > > > way to actually test this out?  That would be the quickest way to get it
> > > > resolved, and to help confirm that this is even an issue at all.
> > >
> > > Something like the appended patch should be sufficient to address this AFAICS.
> > >
> > > I haven't tested it yet (will do so shortly), so all of the usual disclaimers
> > > apply.
> > >
> > > I think that it can be split into 2 patches, but for easier testing here
> > > it goes in one piece.
> >
> > Well, please scratch this, it will not handle "async" devices properly.
> 
> This
> 
> https://lore.kernel.org/linux-pm/6019796.lOV4Wx5bFT@kreacher/
> 
> should address the issue properly (it has been lightly tested).
> 
> Please give it a go and let me know if it works for you (on top of 6.7-rc7).
> 
Thanks for your help.
Actually, this is very rare case on our side and it is the first time for us to see this issue.
Anyway, let me test and let you know.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-27 20:38         ` [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall() Rafael J. Wysocki
@ 2023-12-28 20:29           ` Stanislaw Gruszka
  2023-12-29 13:37             ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Stanislaw Gruszka @ 2023-12-28 20:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pm, Youngmin Nam, rafael, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, Alan Stern, Ulf Hansson

On Wed, Dec 27, 2023 at 09:38:23PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> In preparation for subsequent changes, introduce a specialized variant
> of async_schedule_dev() that will not invoke the argument function
> synchronously when it cannot be scheduled for asynchronous execution.
> 
> The new function, async_schedule_dev_nocall(), will be used for fixing
> possible deadlocks in the system-wide power management core code.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/base/power/main.c |   12 ++++++++----
>  include/linux/async.h     |    2 ++
>  kernel/async.c            |   29 +++++++++++++++++++++++++++++
>  3 files changed, 39 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/kernel/async.c
> ===================================================================
> --- linux-pm.orig/kernel/async.c
> +++ linux-pm/kernel/async.c
> @@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
>  EXPORT_SYMBOL_GPL(async_schedule_node);
>  
>  /**
> + * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
> + * @func: function to execute asynchronously
> + * @dev: device argument to be passed to function
> + *
> + * @dev is used as both the argument for the function and to provide NUMA
> + * context for where to run the function.
> + *
> + * If the asynchronous execution of @func is scheduled successfully, return
> + * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
> + * that will run the function synchronously then.
> + */
> +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> +{
> +	struct async_entry *entry;
> +
> +	entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);

Is GFP_KERNEL intended here ? I think it's not safe since will
be called from device_resume_noirq() .

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-29 13:37             ` Rafael J. Wysocki
@ 2023-12-29  3:08               ` Stanislaw Gruszka
  2023-12-29 16:36                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Stanislaw Gruszka @ 2023-12-29  3:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Greg KH, linux-pm, Youngmin Nam, linux-kernel,
	d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern, Ulf Hansson

On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > +{
> > > +     struct async_entry *entry;
> > > +
> > > +     entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> >
> > Is GFP_KERNEL intended here ?
> 
> Yes, it is.
> 
> PM will be the only user of this, at least for now, and it all runs in
> process context.
> 
> > I think it's not safe since will
> > be called from device_resume_noirq() .
> 
> device_resume_noirq() runs in process context too.
> 
> The name is somewhat confusing (sorry about that) and it means that
> hardirq handlers (for the majority of IRQs) don't run in that resume
> phase, but interrupts are enabled locally on all CPUs (this is
> required for wakeup handling, among other things).

Then my concern would be: if among devices with disabled IRQs are
disk devices? Seems there are disk devices as well, and because
GFP_KERNEL can start reclaiming memory by doing disk IO (write
dirty pages for example), with disk driver interrupts disabled
reclaiming process can not finish.

I do not see how such possible infinite waiting for disk IO
scenario is prevented here, did I miss something?

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-28 20:29           ` Stanislaw Gruszka
@ 2023-12-29 13:37             ` Rafael J. Wysocki
  2023-12-29  3:08               ` Stanislaw Gruszka
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-29 13:37 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Rafael J. Wysocki, Greg KH, linux-pm, Youngmin Nam, rafael,
	linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern,
	Ulf Hansson

On Fri, Dec 29, 2023 at 8:02 AM Stanislaw Gruszka
<stanislaw.gruszka@linux.intel.com> wrote:
>
> On Wed, Dec 27, 2023 at 09:38:23PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > In preparation for subsequent changes, introduce a specialized variant
> > of async_schedule_dev() that will not invoke the argument function
> > synchronously when it cannot be scheduled for asynchronous execution.
> >
> > The new function, async_schedule_dev_nocall(), will be used for fixing
> > possible deadlocks in the system-wide power management core code.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/base/power/main.c |   12 ++++++++----
> >  include/linux/async.h     |    2 ++
> >  kernel/async.c            |   29 +++++++++++++++++++++++++++++
> >  3 files changed, 39 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/kernel/async.c
> > ===================================================================
> > --- linux-pm.orig/kernel/async.c
> > +++ linux-pm/kernel/async.c
> > @@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async
> >  EXPORT_SYMBOL_GPL(async_schedule_node);
> >
> >  /**
> > + * async_schedule_dev_nocall - A simplified variant of async_schedule_dev()
> > + * @func: function to execute asynchronously
> > + * @dev: device argument to be passed to function
> > + *
> > + * @dev is used as both the argument for the function and to provide NUMA
> > + * context for where to run the function.
> > + *
> > + * If the asynchronous execution of @func is scheduled successfully, return
> > + * true. Otherwise, do nothing and return false, unlike async_schedule_dev()
> > + * that will run the function synchronously then.
> > + */
> > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > +{
> > +     struct async_entry *entry;
> > +
> > +     entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
>
> Is GFP_KERNEL intended here ?

Yes, it is.

PM will be the only user of this, at least for now, and it all runs in
process context.

> I think it's not safe since will
> be called from device_resume_noirq() .

device_resume_noirq() runs in process context too.

The name is somewhat confusing (sorry about that) and it means that
hardirq handlers (for the majority of IRQs) don't run in that resume
phase, but interrupts are enabled locally on all CPUs (this is
required for wakeup handling, among other things).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-29  3:08               ` Stanislaw Gruszka
@ 2023-12-29 16:36                 ` Rafael J. Wysocki
  2024-01-02  7:09                   ` Stanislaw Gruszka
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-12-29 16:36 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Greg KH, linux-pm,
	Youngmin Nam, linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu,
	Alan Stern, Ulf Hansson

On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
<stanislaw.gruszka@linux.intel.com> wrote:
>
> On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > +{
> > > > +     struct async_entry *entry;
> > > > +
> > > > +     entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > >
> > > Is GFP_KERNEL intended here ?
> >
> > Yes, it is.
> >
> > PM will be the only user of this, at least for now, and it all runs in
> > process context.
> >
> > > I think it's not safe since will
> > > be called from device_resume_noirq() .
> >
> > device_resume_noirq() runs in process context too.
> >
> > The name is somewhat confusing (sorry about that) and it means that
> > hardirq handlers (for the majority of IRQs) don't run in that resume
> > phase, but interrupts are enabled locally on all CPUs (this is
> > required for wakeup handling, among other things).
>
> Then my concern would be: if among devices with disabled IRQs are
> disk devices? Seems there are disk devices as well, and because
> GFP_KERNEL can start reclaiming memory by doing disk IO (write
> dirty pages for example), with disk driver interrupts disabled
> reclaiming process can not finish.
>
> I do not see how such possible infinite waiting for disk IO
> scenario is prevented here, did I miss something?

Well, it is not a concern, because the suspend code already prevents
the mm subsystem from trying too hard to find free memory.  See the
pm_restrict_gfp_mask() call in enter_state().

Otherwise, it would have been a problem for any GFP_KERNEL allocations
made during system-wide suspend-resume, not just in the _noirq phases.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2023-12-29 16:36                 ` Rafael J. Wysocki
@ 2024-01-02  7:09                   ` Stanislaw Gruszka
  2024-01-02 13:15                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Stanislaw Gruszka @ 2024-01-02  7:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Greg KH, linux-pm, Youngmin Nam, linux-kernel,
	d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern, Ulf Hansson

On Fri, Dec 29, 2023 at 05:36:01PM +0100, Rafael J. Wysocki wrote:
> On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
> <stanislaw.gruszka@linux.intel.com> wrote:
> >
> > On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > > +{
> > > > > +     struct async_entry *entry;
> > > > > +
> > > > > +     entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > > >
> > > > Is GFP_KERNEL intended here ?
> > >
> > > Yes, it is.
> > >
> > > PM will be the only user of this, at least for now, and it all runs in
> > > process context.
> > >
> > > > I think it's not safe since will
> > > > be called from device_resume_noirq() .
> > >
> > > device_resume_noirq() runs in process context too.
> > >
> > > The name is somewhat confusing (sorry about that) and it means that
> > > hardirq handlers (for the majority of IRQs) don't run in that resume
> > > phase, but interrupts are enabled locally on all CPUs (this is
> > > required for wakeup handling, among other things).
> >
> > Then my concern would be: if among devices with disabled IRQs are
> > disk devices? Seems there are disk devices as well, and because
> > GFP_KERNEL can start reclaiming memory by doing disk IO (write
> > dirty pages for example), with disk driver interrupts disabled
> > reclaiming process can not finish.
> >
> > I do not see how such possible infinite waiting for disk IO
> > scenario is prevented here, did I miss something?
> 
> Well, it is not a concern, because the suspend code already prevents
> the mm subsystem from trying too hard to find free memory.  See the
> pm_restrict_gfp_mask() call in enter_state().

So that I missed :-) Thanks for explanations.

Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> for the series.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall()
  2024-01-02  7:09                   ` Stanislaw Gruszka
@ 2024-01-02 13:15                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-01-02 13:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Greg KH, linux-pm,
	Youngmin Nam, linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu,
	Alan Stern, Ulf Hansson

On Tue, Jan 2, 2024 at 8:10 AM Stanislaw Gruszka
<stanislaw.gruszka@linux.intel.com> wrote:
>
> On Fri, Dec 29, 2023 at 05:36:01PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Dec 29, 2023 at 3:54 PM Stanislaw Gruszka
> > <stanislaw.gruszka@linux.intel.com> wrote:
> > >
> > > On Fri, Dec 29, 2023 at 02:37:36PM +0100, Rafael J. Wysocki wrote:
> > > > > > +bool async_schedule_dev_nocall(async_func_t func, struct device *dev)
> > > > > > +{
> > > > > > +     struct async_entry *entry;
> > > > > > +
> > > > > > +     entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL);
> > > > >
> > > > > Is GFP_KERNEL intended here ?
> > > >
> > > > Yes, it is.
> > > >
> > > > PM will be the only user of this, at least for now, and it all runs in
> > > > process context.
> > > >
> > > > > I think it's not safe since will
> > > > > be called from device_resume_noirq() .
> > > >
> > > > device_resume_noirq() runs in process context too.
> > > >
> > > > The name is somewhat confusing (sorry about that) and it means that
> > > > hardirq handlers (for the majority of IRQs) don't run in that resume
> > > > phase, but interrupts are enabled locally on all CPUs (this is
> > > > required for wakeup handling, among other things).
> > >
> > > Then my concern would be: if among devices with disabled IRQs are
> > > disk devices? Seems there are disk devices as well, and because
> > > GFP_KERNEL can start reclaiming memory by doing disk IO (write
> > > dirty pages for example), with disk driver interrupts disabled
> > > reclaiming process can not finish.
> > >
> > > I do not see how such possible infinite waiting for disk IO
> > > scenario is prevented here, did I miss something?
> >
> > Well, it is not a concern, because the suspend code already prevents
> > the mm subsystem from trying too hard to find free memory.  See the
> > pm_restrict_gfp_mask() call in enter_state().
>
> So that I missed :-) Thanks for explanations.
>
> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> for the series.

Thank you!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks
  2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
                           ` (2 preceding siblings ...)
  2023-12-27 20:41         ` [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code Rafael J. Wysocki
@ 2024-01-02 13:18         ` Rafael J. Wysocki
  2024-01-03  4:39           ` Youngmin Nam
  3 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-01-02 13:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pm, Youngmin Nam, rafael, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, Alan Stern, Ulf Hansson

On Wed, Dec 27, 2023 at 9:41 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi Everyone,
>
> As reported here
>
> https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
>
> the device suspend-resume code running during system-wide PM transitions
> deadlock on low memory, because it attempts to acquire a mutex that's
> already held by it in those cases.
>
> This series addresses the issue by changing the resume code behavior
> to directly run the device PM functions synchronously if they cannot
> be scheduled for asynchronous executions (patch [3/3]).
>
> For this purpose, the async code is rearranged (patch [1/3]) and a
> new variant of async_schedule_dev() is introduced (patch [2/3]).

Given the lack of negative feedback, I've queued up this series for 6.8-rc1.

Please let me know if there are any issues with that.

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2023-12-27 20:41         ` [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code Rafael J. Wysocki
@ 2024-01-02 13:35           ` Ulf Hansson
  2024-01-02 13:53             ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Ulf Hansson @ 2024-01-02 13:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pm, Youngmin Nam, rafael, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, Alan Stern

On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It is reported that in low-memory situations the system-wide resume core
> code deadlocks, because async_schedule_dev() executes its argument
> function synchronously if it cannot allocate memory (an not only then)
> and that function attempts to acquire a mutex that is already held.
>
> Address this by changing the code in question to use
> async_schedule_dev_nocall() for scheduling the asynchronous
> execution of device suspend and resume functions and to directly
> run them synchronously if async_schedule_dev_nocall() returns false.
>
> Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
> Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> The commit pointed to by the Fixes: tag is the last one that modified
> the code in question, even though the bug had been there already before.
>
> Still, the fix will not apply to the code before that commit.

An option could be to just do "Cc: stable@vger.kernel.org # v5.7+"
instead of pointing to a commit with a Fixes tag.

>
> ---
>  drivers/base/power/main.c |  148 +++++++++++++++++++++-------------------------
>  1 file changed, 68 insertions(+), 80 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d
>  }
>
>  /**
> - * device_resume_noirq - Execute a "noirq resume" callback for given device.
> + * __device_resume_noirq - Execute a "noirq resume" callback for given device.
>   * @dev: Device to handle.
>   * @state: PM transition of the system being carried out.
>   * @async: If true, the device is being resumed asynchronously.
> @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d
>   * The driver of @dev will not receive interrupts while this function is being
>   * executed.
>   */
> -static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> +static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async)
>  {
>         pm_callback_t callback = NULL;
>         const char *info = NULL;
> @@ -655,7 +655,13 @@ Skip:
>  Out:
>         complete_all(&dev->power.completion);
>         TRACE_RESUME(error);
> -       return error;
> +
> +       if (error) {
> +               suspend_stats.failed_resume_noirq++;
> +               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> +               dpm_save_failed_dev(dev_name(dev));
> +               pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
> +       }
>  }
>
>  static bool is_async(struct device *dev)
> @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device *
>  {
>         reinit_completion(&dev->power.completion);
>
> -       if (is_async(dev)) {
> -               get_device(dev);
> -               async_schedule_dev(func, dev);
> +       if (!is_async(dev))
> +               return false;
> +
> +       get_device(dev);
> +
> +       if (async_schedule_dev_nocall(func, dev))
>                 return true;
> -       }
> +
> +       put_device(dev);
>
>         return false;
>  }
> @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device *
>  static void async_resume_noirq(void *data, async_cookie_t cookie)
>  {
>         struct device *dev = data;
> -       int error;
> -
> -       error = device_resume_noirq(dev, pm_transition, true);
> -       if (error)
> -               pm_dev_err(dev, pm_transition, " async", error);
>
> +       __device_resume_noirq(dev, pm_transition, true);
>         put_device(dev);
>  }
>
> +static void device_resume_noirq(struct device *dev)
> +{
> +       if (dpm_async_fn(dev, async_resume_noirq))
> +               return;
> +
> +       __device_resume_noirq(dev, pm_transition, false);
> +}
> +
>  static void dpm_noirq_resume_devices(pm_message_t state)
>  {
>         struct device *dev;
> @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_
>         mutex_lock(&dpm_list_mtx);
>         pm_transition = state;
>
> -       /*
> -        * Advanced the async threads upfront,
> -        * in case the starting of async threads is
> -        * delayed by non-async resuming devices.
> -        */
> -       list_for_each_entry(dev, &dpm_noirq_list, power.entry)
> -               dpm_async_fn(dev, async_resume_noirq);
> -

If I understand correctly, this means that we are no longer going to
run the async devices upfront, right?

Depending on how devices get ordered in the dpm_noirq_list, it sounds
like the above could have a negative impact on the total resume time!?
Of course, if all devices would be async capable this wouldn't be a
problem...

>         while (!list_empty(&dpm_noirq_list)) {
>                 dev = to_device(dpm_noirq_list.next);
>                 get_device(dev);
> @@ -713,17 +719,7 @@ static void dpm_noirq_resume_devices(pm_
>
>                 mutex_unlock(&dpm_list_mtx);
>
> -               if (!is_async(dev)) {
> -                       int error;
> -
> -                       error = device_resume_noirq(dev, state, false);
> -                       if (error) {
> -                               suspend_stats.failed_resume_noirq++;
> -                               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> -                               dpm_save_failed_dev(dev_name(dev));
> -                               pm_dev_err(dev, state, " noirq", error);
> -                       }
> -               }
> +               device_resume_noirq(dev);
>
>                 put_device(dev);
>
> @@ -751,14 +747,14 @@ void dpm_resume_noirq(pm_message_t state
>  }
>
>  /**
> - * device_resume_early - Execute an "early resume" callback for given device.
> + * __device_resume_early - Execute an "early resume" callback for given device.
>   * @dev: Device to handle.
>   * @state: PM transition of the system being carried out.
>   * @async: If true, the device is being resumed asynchronously.
>   *
>   * Runtime PM is disabled for @dev while this function is being executed.
>   */
> -static int device_resume_early(struct device *dev, pm_message_t state, bool async)
> +static void __device_resume_early(struct device *dev, pm_message_t state, bool async)
>  {
>         pm_callback_t callback = NULL;
>         const char *info = NULL;
> @@ -811,21 +807,31 @@ Out:
>
>         pm_runtime_enable(dev);
>         complete_all(&dev->power.completion);
> -       return error;
> +
> +       if (error) {
> +               suspend_stats.failed_resume_early++;
> +               dpm_save_failed_step(SUSPEND_RESUME_EARLY);
> +               dpm_save_failed_dev(dev_name(dev));
> +               pm_dev_err(dev, state, async ? " async early" : " early", error);
> +       }
>  }
>
>  static void async_resume_early(void *data, async_cookie_t cookie)
>  {
>         struct device *dev = data;
> -       int error;
> -
> -       error = device_resume_early(dev, pm_transition, true);
> -       if (error)
> -               pm_dev_err(dev, pm_transition, " async", error);
>
> +       __device_resume_early(dev, pm_transition, true);
>         put_device(dev);
>  }
>
> +static void device_resume_early(struct device *dev)
> +{
> +       if (dpm_async_fn(dev, async_resume_early))
> +               return;
> +
> +       __device_resume_early(dev, pm_transition, false);
> +}
> +
>  /**
>   * dpm_resume_early - Execute "early resume" callbacks for all devices.
>   * @state: PM transition of the system being carried out.
> @@ -839,14 +845,6 @@ void dpm_resume_early(pm_message_t state
>         mutex_lock(&dpm_list_mtx);
>         pm_transition = state;
>
> -       /*
> -        * Advanced the async threads upfront,
> -        * in case the starting of async threads is
> -        * delayed by non-async resuming devices.
> -        */
> -       list_for_each_entry(dev, &dpm_late_early_list, power.entry)
> -               dpm_async_fn(dev, async_resume_early);
> -

Ditto.

>         while (!list_empty(&dpm_late_early_list)) {
>                 dev = to_device(dpm_late_early_list.next);
>                 get_device(dev);
> @@ -854,17 +852,7 @@ void dpm_resume_early(pm_message_t state
>
>                 mutex_unlock(&dpm_list_mtx);
>
> -               if (!is_async(dev)) {
> -                       int error;
> -
> -                       error = device_resume_early(dev, state, false);
> -                       if (error) {
> -                               suspend_stats.failed_resume_early++;
> -                               dpm_save_failed_step(SUSPEND_RESUME_EARLY);
> -                               dpm_save_failed_dev(dev_name(dev));
> -                               pm_dev_err(dev, state, " early", error);
> -                       }
> -               }
> +               device_resume_early(dev);
>
>                 put_device(dev);
>
> @@ -888,12 +876,12 @@ void dpm_resume_start(pm_message_t state
>  EXPORT_SYMBOL_GPL(dpm_resume_start);
>
>  /**
> - * device_resume - Execute "resume" callbacks for given device.
> + * __device_resume - Execute "resume" callbacks for given device.
>   * @dev: Device to handle.
>   * @state: PM transition of the system being carried out.
>   * @async: If true, the device is being resumed asynchronously.
>   */
> -static int device_resume(struct device *dev, pm_message_t state, bool async)
> +static void __device_resume(struct device *dev, pm_message_t state, bool async)
>  {
>         pm_callback_t callback = NULL;
>         const char *info = NULL;
> @@ -975,20 +963,30 @@ static int device_resume(struct device *
>
>         TRACE_RESUME(error);
>
> -       return error;
> +       if (error) {
> +               suspend_stats.failed_resume++;
> +               dpm_save_failed_step(SUSPEND_RESUME);
> +               dpm_save_failed_dev(dev_name(dev));
> +               pm_dev_err(dev, state, async ? " async" : "", error);
> +       }
>  }
>
>  static void async_resume(void *data, async_cookie_t cookie)
>  {
>         struct device *dev = data;
> -       int error;
>
> -       error = device_resume(dev, pm_transition, true);
> -       if (error)
> -               pm_dev_err(dev, pm_transition, " async", error);
> +       __device_resume(dev, pm_transition, true);
>         put_device(dev);
>  }
>
> +static void device_resume(struct device *dev)
> +{
> +       if (dpm_async_fn(dev, async_resume))
> +               return;
> +
> +       __device_resume(dev, pm_transition, false);
> +}
> +
>  /**
>   * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
>   * @state: PM transition of the system being carried out.
> @@ -1008,27 +1006,17 @@ void dpm_resume(pm_message_t state)
>         pm_transition = state;
>         async_error = 0;
>
> -       list_for_each_entry(dev, &dpm_suspended_list, power.entry)
> -               dpm_async_fn(dev, async_resume);
> -

Ditto.

>         while (!list_empty(&dpm_suspended_list)) {
>                 dev = to_device(dpm_suspended_list.next);
> +
>                 get_device(dev);
> -               if (!is_async(dev)) {
> -                       int error;
>
> -                       mutex_unlock(&dpm_list_mtx);
> +               mutex_unlock(&dpm_list_mtx);
>
> -                       error = device_resume(dev, state, false);
> -                       if (error) {
> -                               suspend_stats.failed_resume++;
> -                               dpm_save_failed_step(SUSPEND_RESUME);
> -                               dpm_save_failed_dev(dev_name(dev));
> -                               pm_dev_err(dev, state, "", error);
> -                       }
> +               device_resume(dev);
> +
> +               mutex_lock(&dpm_list_mtx);
>
> -                       mutex_lock(&dpm_list_mtx);
> -               }
>                 if (!list_empty(&dev->power.entry))
>                         list_move_tail(&dev->power.entry, &dpm_prepared_list);
>

Other than the potential issue I pointed out, the code as such looks good to me!

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2024-01-02 13:35           ` Ulf Hansson
@ 2024-01-02 13:53             ` Rafael J. Wysocki
  2024-01-03 10:17               ` Ulf Hansson
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-01-02 13:53 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Greg KH, linux-pm, Youngmin Nam, rafael,
	linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern

On Tue, Jan 2, 2024 at 2:35 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > It is reported that in low-memory situations the system-wide resume core
> > code deadlocks, because async_schedule_dev() executes its argument
> > function synchronously if it cannot allocate memory (an not only then)
> > and that function attempts to acquire a mutex that is already held.
> >
> > Address this by changing the code in question to use
> > async_schedule_dev_nocall() for scheduling the asynchronous
> > execution of device suspend and resume functions and to directly
> > run them synchronously if async_schedule_dev_nocall() returns false.
> >
> > Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
> > Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> > Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > The commit pointed to by the Fixes: tag is the last one that modified
> > the code in question, even though the bug had been there already before.
> >
> > Still, the fix will not apply to the code before that commit.
>
> An option could be to just do "Cc: stable@vger.kernel.org # v5.7+"
> instead of pointing to a commit with a Fixes tag.

Right, but one can argue that every commit with a "Cc: stable" tag is
a fix, so it should carry a Fixes: tag too anyway.

> >
> > ---
> >  drivers/base/power/main.c |  148 +++++++++++++++++++++-------------------------
> >  1 file changed, 68 insertions(+), 80 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/main.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/main.c
> > +++ linux-pm/drivers/base/power/main.c
> > @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d
> >  }
> >
> >  /**
> > - * device_resume_noirq - Execute a "noirq resume" callback for given device.
> > + * __device_resume_noirq - Execute a "noirq resume" callback for given device.
> >   * @dev: Device to handle.
> >   * @state: PM transition of the system being carried out.
> >   * @async: If true, the device is being resumed asynchronously.
> > @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d
> >   * The driver of @dev will not receive interrupts while this function is being
> >   * executed.
> >   */
> > -static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> > +static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> >  {
> >         pm_callback_t callback = NULL;
> >         const char *info = NULL;
> > @@ -655,7 +655,13 @@ Skip:
> >  Out:
> >         complete_all(&dev->power.completion);
> >         TRACE_RESUME(error);
> > -       return error;
> > +
> > +       if (error) {
> > +               suspend_stats.failed_resume_noirq++;
> > +               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> > +               dpm_save_failed_dev(dev_name(dev));
> > +               pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
> > +       }
> >  }
> >
> >  static bool is_async(struct device *dev)
> > @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device *
> >  {
> >         reinit_completion(&dev->power.completion);
> >
> > -       if (is_async(dev)) {
> > -               get_device(dev);
> > -               async_schedule_dev(func, dev);
> > +       if (!is_async(dev))
> > +               return false;
> > +
> > +       get_device(dev);
> > +
> > +       if (async_schedule_dev_nocall(func, dev))
> >                 return true;
> > -       }
> > +
> > +       put_device(dev);
> >
> >         return false;
> >  }
> > @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device *
> >  static void async_resume_noirq(void *data, async_cookie_t cookie)
> >  {
> >         struct device *dev = data;
> > -       int error;
> > -
> > -       error = device_resume_noirq(dev, pm_transition, true);
> > -       if (error)
> > -               pm_dev_err(dev, pm_transition, " async", error);
> >
> > +       __device_resume_noirq(dev, pm_transition, true);
> >         put_device(dev);
> >  }
> >
> > +static void device_resume_noirq(struct device *dev)
> > +{
> > +       if (dpm_async_fn(dev, async_resume_noirq))
> > +               return;
> > +
> > +       __device_resume_noirq(dev, pm_transition, false);
> > +}
> > +
> >  static void dpm_noirq_resume_devices(pm_message_t state)
> >  {
> >         struct device *dev;
> > @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_
> >         mutex_lock(&dpm_list_mtx);
> >         pm_transition = state;
> >
> > -       /*
> > -        * Advanced the async threads upfront,
> > -        * in case the starting of async threads is
> > -        * delayed by non-async resuming devices.
> > -        */
> > -       list_for_each_entry(dev, &dpm_noirq_list, power.entry)
> > -               dpm_async_fn(dev, async_resume_noirq);
> > -
>
> If I understand correctly, this means that we are no longer going to
> run the async devices upfront, right?

Right.

> Depending on how devices get ordered in the dpm_noirq_list, it sounds
> like the above could have a negative impact on the total resume time!?

It could, but it is unclear at this time whether or not it will.

> Of course, if all devices would be async capable this wouldn't be a
> problem...

Sure.

So the existing behavior can be restored with the help of an
additional device flag, but I didn't decide to add such a flag just
yet.

I'll probably do it in 6.9, unless the performance impact is serious
enough, in which case it can be added earlier.

I still would prefer to get to a point at which the suspend and resume
paths are analogous (from the async POV) and that's what happens after
this patch, so I'd say that IMO it is better to address any
performance regressions on top of it.

> >         while (!list_empty(&dpm_noirq_list)) {
> >                 dev = to_device(dpm_noirq_list.next);
> >                 get_device(dev);
> > @@ -713,17 +719,7 @@ static void dpm_noirq_resume_devices(pm_
> >
> >                 mutex_unlock(&dpm_list_mtx);
> >
> > -               if (!is_async(dev)) {
> > -                       int error;
> > -
> > -                       error = device_resume_noirq(dev, state, false);
> > -                       if (error) {
> > -                               suspend_stats.failed_resume_noirq++;
> > -                               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> > -                               dpm_save_failed_dev(dev_name(dev));
> > -                               pm_dev_err(dev, state, " noirq", error);
> > -                       }
> > -               }
> > +               device_resume_noirq(dev);
> >
> >                 put_device(dev);
> >
> > @@ -751,14 +747,14 @@ void dpm_resume_noirq(pm_message_t state
> >  }
> >
> >  /**
> > - * device_resume_early - Execute an "early resume" callback for given device.
> > + * __device_resume_early - Execute an "early resume" callback for given device.
> >   * @dev: Device to handle.
> >   * @state: PM transition of the system being carried out.
> >   * @async: If true, the device is being resumed asynchronously.
> >   *
> >   * Runtime PM is disabled for @dev while this function is being executed.
> >   */
> > -static int device_resume_early(struct device *dev, pm_message_t state, bool async)
> > +static void __device_resume_early(struct device *dev, pm_message_t state, bool async)
> >  {
> >         pm_callback_t callback = NULL;
> >         const char *info = NULL;
> > @@ -811,21 +807,31 @@ Out:
> >
> >         pm_runtime_enable(dev);
> >         complete_all(&dev->power.completion);
> > -       return error;
> > +
> > +       if (error) {
> > +               suspend_stats.failed_resume_early++;
> > +               dpm_save_failed_step(SUSPEND_RESUME_EARLY);
> > +               dpm_save_failed_dev(dev_name(dev));
> > +               pm_dev_err(dev, state, async ? " async early" : " early", error);
> > +       }
> >  }
> >
> >  static void async_resume_early(void *data, async_cookie_t cookie)
> >  {
> >         struct device *dev = data;
> > -       int error;
> > -
> > -       error = device_resume_early(dev, pm_transition, true);
> > -       if (error)
> > -               pm_dev_err(dev, pm_transition, " async", error);
> >
> > +       __device_resume_early(dev, pm_transition, true);
> >         put_device(dev);
> >  }
> >
> > +static void device_resume_early(struct device *dev)
> > +{
> > +       if (dpm_async_fn(dev, async_resume_early))
> > +               return;
> > +
> > +       __device_resume_early(dev, pm_transition, false);
> > +}
> > +
> >  /**
> >   * dpm_resume_early - Execute "early resume" callbacks for all devices.
> >   * @state: PM transition of the system being carried out.
> > @@ -839,14 +845,6 @@ void dpm_resume_early(pm_message_t state
> >         mutex_lock(&dpm_list_mtx);
> >         pm_transition = state;
> >
> > -       /*
> > -        * Advanced the async threads upfront,
> > -        * in case the starting of async threads is
> > -        * delayed by non-async resuming devices.
> > -        */
> > -       list_for_each_entry(dev, &dpm_late_early_list, power.entry)
> > -               dpm_async_fn(dev, async_resume_early);
> > -
>
> Ditto.
>
> >         while (!list_empty(&dpm_late_early_list)) {
> >                 dev = to_device(dpm_late_early_list.next);
> >                 get_device(dev);
> > @@ -854,17 +852,7 @@ void dpm_resume_early(pm_message_t state
> >
> >                 mutex_unlock(&dpm_list_mtx);
> >
> > -               if (!is_async(dev)) {
> > -                       int error;
> > -
> > -                       error = device_resume_early(dev, state, false);
> > -                       if (error) {
> > -                               suspend_stats.failed_resume_early++;
> > -                               dpm_save_failed_step(SUSPEND_RESUME_EARLY);
> > -                               dpm_save_failed_dev(dev_name(dev));
> > -                               pm_dev_err(dev, state, " early", error);
> > -                       }
> > -               }
> > +               device_resume_early(dev);
> >
> >                 put_device(dev);
> >
> > @@ -888,12 +876,12 @@ void dpm_resume_start(pm_message_t state
> >  EXPORT_SYMBOL_GPL(dpm_resume_start);
> >
> >  /**
> > - * device_resume - Execute "resume" callbacks for given device.
> > + * __device_resume - Execute "resume" callbacks for given device.
> >   * @dev: Device to handle.
> >   * @state: PM transition of the system being carried out.
> >   * @async: If true, the device is being resumed asynchronously.
> >   */
> > -static int device_resume(struct device *dev, pm_message_t state, bool async)
> > +static void __device_resume(struct device *dev, pm_message_t state, bool async)
> >  {
> >         pm_callback_t callback = NULL;
> >         const char *info = NULL;
> > @@ -975,20 +963,30 @@ static int device_resume(struct device *
> >
> >         TRACE_RESUME(error);
> >
> > -       return error;
> > +       if (error) {
> > +               suspend_stats.failed_resume++;
> > +               dpm_save_failed_step(SUSPEND_RESUME);
> > +               dpm_save_failed_dev(dev_name(dev));
> > +               pm_dev_err(dev, state, async ? " async" : "", error);
> > +       }
> >  }
> >
> >  static void async_resume(void *data, async_cookie_t cookie)
> >  {
> >         struct device *dev = data;
> > -       int error;
> >
> > -       error = device_resume(dev, pm_transition, true);
> > -       if (error)
> > -               pm_dev_err(dev, pm_transition, " async", error);
> > +       __device_resume(dev, pm_transition, true);
> >         put_device(dev);
> >  }
> >
> > +static void device_resume(struct device *dev)
> > +{
> > +       if (dpm_async_fn(dev, async_resume))
> > +               return;
> > +
> > +       __device_resume(dev, pm_transition, false);
> > +}
> > +
> >  /**
> >   * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
> >   * @state: PM transition of the system being carried out.
> > @@ -1008,27 +1006,17 @@ void dpm_resume(pm_message_t state)
> >         pm_transition = state;
> >         async_error = 0;
> >
> > -       list_for_each_entry(dev, &dpm_suspended_list, power.entry)
> > -               dpm_async_fn(dev, async_resume);
> > -
>
> Ditto.
>
> >         while (!list_empty(&dpm_suspended_list)) {
> >                 dev = to_device(dpm_suspended_list.next);
> > +
> >                 get_device(dev);
> > -               if (!is_async(dev)) {
> > -                       int error;
> >
> > -                       mutex_unlock(&dpm_list_mtx);
> > +               mutex_unlock(&dpm_list_mtx);
> >
> > -                       error = device_resume(dev, state, false);
> > -                       if (error) {
> > -                               suspend_stats.failed_resume++;
> > -                               dpm_save_failed_step(SUSPEND_RESUME);
> > -                               dpm_save_failed_dev(dev_name(dev));
> > -                               pm_dev_err(dev, state, "", error);
> > -                       }
> > +               device_resume(dev);
> > +
> > +               mutex_lock(&dpm_list_mtx);
> >
> > -                       mutex_lock(&dpm_list_mtx);
> > -               }
> >                 if (!list_empty(&dev->power.entry))
> >                         list_move_tail(&dev->power.entry, &dpm_prepared_list);
> >
>
> Other than the potential issue I pointed out, the code as such looks good to me!

Thank you!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks
  2024-01-02 13:18         ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
@ 2024-01-03  4:39           ` Youngmin Nam
  2024-01-03 10:28             ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Youngmin Nam @ 2024-01-03  4:39 UTC (permalink / raw)
  To: Rafael J. Wysocki, Rafael J. Wysocki
  Cc: Greg KH, linux-pm, Youngmin Nam, rafael, linux-kernel, d7271.choe,
	janghyuck.kim, hyesoo.yu, hs.gil, yulgon.kim, Alan Stern,
	Ulf Hansson

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Tue, Jan 02, 2024 at 02:18:43PM +0100, Rafael J. Wysocki wrote:
> On Wed, Dec 27, 2023 at 9:41 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Hi Everyone,
> >
> > As reported here
> >
> > https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> >
> > the device suspend-resume code running during system-wide PM transitions
> > deadlock on low memory, because it attempts to acquire a mutex that's
> > already held by it in those cases.
> >
> > This series addresses the issue by changing the resume code behavior
> > to directly run the device PM functions synchronously if they cannot
> > be scheduled for asynchronous executions (patch [3/3]).
> >
> > For this purpose, the async code is rearranged (patch [1/3]) and a
> > new variant of async_schedule_dev() is introduced (patch [2/3]).
> 
> Given the lack of negative feedback, I've queued up this series for 6.8-rc1.
> 
> Please let me know if there are any issues with that.
> 
> Thanks!
> 
Hi Rafael

We haven't seen any regression issue under our stress test.

So, feel free to add

Tested-by: Youngmin Nam <youngmin.nam@samsung.com>

Thanks.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2024-01-02 13:53             ` Rafael J. Wysocki
@ 2024-01-03 10:17               ` Ulf Hansson
  2024-01-03 10:27                 ` Rafael J. Wysocki
  2024-01-03 10:33                 ` Greg KH
  0 siblings, 2 replies; 25+ messages in thread
From: Ulf Hansson @ 2024-01-03 10:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Greg KH, linux-pm, Youngmin Nam, linux-kernel,
	d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern

On Tue, 2 Jan 2024 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jan 2, 2024 at 2:35 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > It is reported that in low-memory situations the system-wide resume core
> > > code deadlocks, because async_schedule_dev() executes its argument
> > > function synchronously if it cannot allocate memory (an not only then)
> > > and that function attempts to acquire a mutex that is already held.
> > >
> > > Address this by changing the code in question to use
> > > async_schedule_dev_nocall() for scheduling the asynchronous
> > > execution of device suspend and resume functions and to directly
> > > run them synchronously if async_schedule_dev_nocall() returns false.
> > >
> > > Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
> > > Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> > > Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > The commit pointed to by the Fixes: tag is the last one that modified
> > > the code in question, even though the bug had been there already before.
> > >
> > > Still, the fix will not apply to the code before that commit.
> >
> > An option could be to just do "Cc: stable@vger.kernel.org # v5.7+"
> > instead of pointing to a commit with a Fixes tag.
>
> Right, but one can argue that every commit with a "Cc: stable" tag is
> a fix, so it should carry a Fixes: tag too anyway.

Yes, certainly. But in this case it's more questionable as it's not
really fixing the commit it points out.

Note that, I have no strong opinion here, but maybe Greg has a preferred way?

>
> > >
> > > ---
> > >  drivers/base/power/main.c |  148 +++++++++++++++++++++-------------------------
> > >  1 file changed, 68 insertions(+), 80 deletions(-)
> > >
> > > Index: linux-pm/drivers/base/power/main.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/base/power/main.c
> > > +++ linux-pm/drivers/base/power/main.c
> > > @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d
> > >  }
> > >
> > >  /**
> > > - * device_resume_noirq - Execute a "noirq resume" callback for given device.
> > > + * __device_resume_noirq - Execute a "noirq resume" callback for given device.
> > >   * @dev: Device to handle.
> > >   * @state: PM transition of the system being carried out.
> > >   * @async: If true, the device is being resumed asynchronously.
> > > @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d
> > >   * The driver of @dev will not receive interrupts while this function is being
> > >   * executed.
> > >   */
> > > -static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> > > +static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> > >  {
> > >         pm_callback_t callback = NULL;
> > >         const char *info = NULL;
> > > @@ -655,7 +655,13 @@ Skip:
> > >  Out:
> > >         complete_all(&dev->power.completion);
> > >         TRACE_RESUME(error);
> > > -       return error;
> > > +
> > > +       if (error) {
> > > +               suspend_stats.failed_resume_noirq++;
> > > +               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> > > +               dpm_save_failed_dev(dev_name(dev));
> > > +               pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
> > > +       }
> > >  }
> > >
> > >  static bool is_async(struct device *dev)
> > > @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device *
> > >  {
> > >         reinit_completion(&dev->power.completion);
> > >
> > > -       if (is_async(dev)) {
> > > -               get_device(dev);
> > > -               async_schedule_dev(func, dev);
> > > +       if (!is_async(dev))
> > > +               return false;
> > > +
> > > +       get_device(dev);
> > > +
> > > +       if (async_schedule_dev_nocall(func, dev))
> > >                 return true;
> > > -       }
> > > +
> > > +       put_device(dev);
> > >
> > >         return false;
> > >  }
> > > @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device *
> > >  static void async_resume_noirq(void *data, async_cookie_t cookie)
> > >  {
> > >         struct device *dev = data;
> > > -       int error;
> > > -
> > > -       error = device_resume_noirq(dev, pm_transition, true);
> > > -       if (error)
> > > -               pm_dev_err(dev, pm_transition, " async", error);
> > >
> > > +       __device_resume_noirq(dev, pm_transition, true);
> > >         put_device(dev);
> > >  }
> > >
> > > +static void device_resume_noirq(struct device *dev)
> > > +{
> > > +       if (dpm_async_fn(dev, async_resume_noirq))
> > > +               return;
> > > +
> > > +       __device_resume_noirq(dev, pm_transition, false);
> > > +}
> > > +
> > >  static void dpm_noirq_resume_devices(pm_message_t state)
> > >  {
> > >         struct device *dev;
> > > @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_
> > >         mutex_lock(&dpm_list_mtx);
> > >         pm_transition = state;
> > >
> > > -       /*
> > > -        * Advanced the async threads upfront,
> > > -        * in case the starting of async threads is
> > > -        * delayed by non-async resuming devices.
> > > -        */
> > > -       list_for_each_entry(dev, &dpm_noirq_list, power.entry)
> > > -               dpm_async_fn(dev, async_resume_noirq);
> > > -
> >
> > If I understand correctly, this means that we are no longer going to
> > run the async devices upfront, right?
>
> Right.
>
> > Depending on how devices get ordered in the dpm_noirq_list, it sounds
> > like the above could have a negative impact on the total resume time!?
>
> It could, but it is unclear at this time whether or not it will.
>
> > Of course, if all devices would be async capable this wouldn't be a
> > problem...
>
> Sure.
>
> So the existing behavior can be restored with the help of an
> additional device flag, but I didn't decide to add such a flag just
> yet.
>
> I'll probably do it in 6.9, unless the performance impact is serious
> enough, in which case it can be added earlier.
>
> I still would prefer to get to a point at which the suspend and resume
> paths are analogous (from the async POV) and that's what happens after
> this patch, so I'd say that IMO it is better to address any
> performance regressions on top of it.

Fair enough!

[...]

Feel free to add my Reviewed-by for the series!

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2024-01-03 10:17               ` Ulf Hansson
@ 2024-01-03 10:27                 ` Rafael J. Wysocki
  2024-01-03 10:33                 ` Greg KH
  1 sibling, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-01-03 10:27 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Greg KH, linux-pm,
	Youngmin Nam, linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu,
	Alan Stern

On Wed, Jan 3, 2024 at 11:17 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 2 Jan 2024 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Jan 2, 2024 at 2:35 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > It is reported that in low-memory situations the system-wide resume core
> > > > code deadlocks, because async_schedule_dev() executes its argument
> > > > function synchronously if it cannot allocate memory (an not only then)
> > > > and that function attempts to acquire a mutex that is already held.
> > > >
> > > > Address this by changing the code in question to use
> > > > async_schedule_dev_nocall() for scheduling the asynchronous
> > > > execution of device suspend and resume functions and to directly
> > > > run them synchronously if async_schedule_dev_nocall() returns false.
> > > >
> > > > Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
> > > > Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> > > > Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > The commit pointed to by the Fixes: tag is the last one that modified
> > > > the code in question, even though the bug had been there already before.
> > > >
> > > > Still, the fix will not apply to the code before that commit.
> > >
> > > An option could be to just do "Cc: stable@vger.kernel.org # v5.7+"
> > > instead of pointing to a commit with a Fixes tag.
> >
> > Right, but one can argue that every commit with a "Cc: stable" tag is
> > a fix, so it should carry a Fixes: tag too anyway.
>
> Yes, certainly. But in this case it's more questionable as it's not
> really fixing the commit it points out.
>
> Note that, I have no strong opinion here, but maybe Greg has a preferred way?
>
> >
> > > >
> > > > ---
> > > >  drivers/base/power/main.c |  148 +++++++++++++++++++++-------------------------
> > > >  1 file changed, 68 insertions(+), 80 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/base/power/main.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/base/power/main.c
> > > > +++ linux-pm/drivers/base/power/main.c
> > > > @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d
> > > >  }
> > > >
> > > >  /**
> > > > - * device_resume_noirq - Execute a "noirq resume" callback for given device.
> > > > + * __device_resume_noirq - Execute a "noirq resume" callback for given device.
> > > >   * @dev: Device to handle.
> > > >   * @state: PM transition of the system being carried out.
> > > >   * @async: If true, the device is being resumed asynchronously.
> > > > @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d
> > > >   * The driver of @dev will not receive interrupts while this function is being
> > > >   * executed.
> > > >   */
> > > > -static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> > > > +static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async)
> > > >  {
> > > >         pm_callback_t callback = NULL;
> > > >         const char *info = NULL;
> > > > @@ -655,7 +655,13 @@ Skip:
> > > >  Out:
> > > >         complete_all(&dev->power.completion);
> > > >         TRACE_RESUME(error);
> > > > -       return error;
> > > > +
> > > > +       if (error) {
> > > > +               suspend_stats.failed_resume_noirq++;
> > > > +               dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
> > > > +               dpm_save_failed_dev(dev_name(dev));
> > > > +               pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
> > > > +       }
> > > >  }
> > > >
> > > >  static bool is_async(struct device *dev)
> > > > @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device *
> > > >  {
> > > >         reinit_completion(&dev->power.completion);
> > > >
> > > > -       if (is_async(dev)) {
> > > > -               get_device(dev);
> > > > -               async_schedule_dev(func, dev);
> > > > +       if (!is_async(dev))
> > > > +               return false;
> > > > +
> > > > +       get_device(dev);
> > > > +
> > > > +       if (async_schedule_dev_nocall(func, dev))
> > > >                 return true;
> > > > -       }
> > > > +
> > > > +       put_device(dev);
> > > >
> > > >         return false;
> > > >  }
> > > > @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device *
> > > >  static void async_resume_noirq(void *data, async_cookie_t cookie)
> > > >  {
> > > >         struct device *dev = data;
> > > > -       int error;
> > > > -
> > > > -       error = device_resume_noirq(dev, pm_transition, true);
> > > > -       if (error)
> > > > -               pm_dev_err(dev, pm_transition, " async", error);
> > > >
> > > > +       __device_resume_noirq(dev, pm_transition, true);
> > > >         put_device(dev);
> > > >  }
> > > >
> > > > +static void device_resume_noirq(struct device *dev)
> > > > +{
> > > > +       if (dpm_async_fn(dev, async_resume_noirq))
> > > > +               return;
> > > > +
> > > > +       __device_resume_noirq(dev, pm_transition, false);
> > > > +}
> > > > +
> > > >  static void dpm_noirq_resume_devices(pm_message_t state)
> > > >  {
> > > >         struct device *dev;
> > > > @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_
> > > >         mutex_lock(&dpm_list_mtx);
> > > >         pm_transition = state;
> > > >
> > > > -       /*
> > > > -        * Advanced the async threads upfront,
> > > > -        * in case the starting of async threads is
> > > > -        * delayed by non-async resuming devices.
> > > > -        */
> > > > -       list_for_each_entry(dev, &dpm_noirq_list, power.entry)
> > > > -               dpm_async_fn(dev, async_resume_noirq);
> > > > -
> > >
> > > If I understand correctly, this means that we are no longer going to
> > > run the async devices upfront, right?
> >
> > Right.
> >
> > > Depending on how devices get ordered in the dpm_noirq_list, it sounds
> > > like the above could have a negative impact on the total resume time!?
> >
> > It could, but it is unclear at this time whether or not it will.
> >
> > > Of course, if all devices would be async capable this wouldn't be a
> > > problem...
> >
> > Sure.
> >
> > So the existing behavior can be restored with the help of an
> > additional device flag, but I didn't decide to add such a flag just
> > yet.
> >
> > I'll probably do it in 6.9, unless the performance impact is serious
> > enough, in which case it can be added earlier.
> >
> > I still would prefer to get to a point at which the suspend and resume
> > paths are analogous (from the async POV) and that's what happens after
> > this patch, so I'd say that IMO it is better to address any
> > performance regressions on top of it.
>
> Fair enough!
>
> [...]
>
> Feel free to add my Reviewed-by for the series!

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks
  2024-01-03  4:39           ` Youngmin Nam
@ 2024-01-03 10:28             ` Rafael J. Wysocki
  0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2024-01-03 10:28 UTC (permalink / raw)
  To: Youngmin Nam
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Greg KH, linux-pm,
	linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu, hs.gil,
	yulgon.kim, Alan Stern, Ulf Hansson

On Wed, Jan 3, 2024 at 5:39 AM Youngmin Nam <youngmin.nam@samsung.com> wrote:
>
> On Tue, Jan 02, 2024 at 02:18:43PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Dec 27, 2023 at 9:41 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > Hi Everyone,
> > >
> > > As reported here
> > >
> > > https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> > >
> > > the device suspend-resume code running during system-wide PM transitions
> > > deadlock on low memory, because it attempts to acquire a mutex that's
> > > already held by it in those cases.
> > >
> > > This series addresses the issue by changing the resume code behavior
> > > to directly run the device PM functions synchronously if they cannot
> > > be scheduled for asynchronous executions (patch [3/3]).
> > >
> > > For this purpose, the async code is rearranged (patch [1/3]) and a
> > > new variant of async_schedule_dev() is introduced (patch [2/3]).
> >
> > Given the lack of negative feedback, I've queued up this series for 6.8-rc1.
> >
> > Please let me know if there are any issues with that.
> >
> > Thanks!
> >
> Hi Rafael
>
> We haven't seen any regression issue under our stress test.
>
> So, feel free to add
>
> Tested-by: Youngmin Nam <youngmin.nam@samsung.com>

Thank you!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code
  2024-01-03 10:17               ` Ulf Hansson
  2024-01-03 10:27                 ` Rafael J. Wysocki
@ 2024-01-03 10:33                 ` Greg KH
  1 sibling, 0 replies; 25+ messages in thread
From: Greg KH @ 2024-01-03 10:33 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, linux-pm, Youngmin Nam,
	linux-kernel, d7271.choe, janghyuck.kim, hyesoo.yu, Alan Stern

On Wed, Jan 03, 2024 at 11:17:18AM +0100, Ulf Hansson wrote:
> On Tue, 2 Jan 2024 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Jan 2, 2024 at 2:35 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Wed, 27 Dec 2023 at 21:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > It is reported that in low-memory situations the system-wide resume core
> > > > code deadlocks, because async_schedule_dev() executes its argument
> > > > function synchronously if it cannot allocate memory (an not only then)
> > > > and that function attempts to acquire a mutex that is already held.
> > > >
> > > > Address this by changing the code in question to use
> > > > async_schedule_dev_nocall() for scheduling the asynchronous
> > > > execution of device suspend and resume functions and to directly
> > > > run them synchronously if async_schedule_dev_nocall() returns false.
> > > >
> > > > Fixes: 09beebd8f93b ("PM: sleep: core: Switch back to async_schedule_dev()")
> > > > Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/
> > > > Reported-by: Youngmin Nam <youngmin.nam@samsung.com>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > The commit pointed to by the Fixes: tag is the last one that modified
> > > > the code in question, even though the bug had been there already before.
> > > >
> > > > Still, the fix will not apply to the code before that commit.
> > >
> > > An option could be to just do "Cc: stable@vger.kernel.org # v5.7+"
> > > instead of pointing to a commit with a Fixes tag.
> >
> > Right, but one can argue that every commit with a "Cc: stable" tag is
> > a fix, so it should carry a Fixes: tag too anyway.
> 
> Yes, certainly. But in this case it's more questionable as it's not
> really fixing the commit it points out.
> 
> Note that, I have no strong opinion here, but maybe Greg has a preferred way?

Either is fine with me, just to give me a good hint on how far back a
commit should be backported to.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-01-03 10:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20231227084252epcas2p3b063f7852f81f82cd0a31afd7f404db4@epcas2p3.samsung.com>
2023-12-27  8:42 ` [BUG] mutex deadlock of dpm_resume() in low memory situation Youngmin Nam
2023-12-27 16:08   ` Greg KH
2023-12-27 17:44     ` Rafael J. Wysocki
2023-12-27 18:39     ` Rafael J. Wysocki
2023-12-27 18:58       ` Rafael J. Wysocki
2023-12-27 20:50         ` Rafael J. Wysocki
2023-12-28  6:40           ` Youngmin Nam
2023-12-27 20:35       ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
2023-12-27 20:37         ` [PATCH v1 1/3] async: Split async_schedule_node_domain() Rafael J. Wysocki
2023-12-27 20:38         ` [PATCH v1 2/3] async: Introduce async_schedule_dev_nocall() Rafael J. Wysocki
2023-12-28 20:29           ` Stanislaw Gruszka
2023-12-29 13:37             ` Rafael J. Wysocki
2023-12-29  3:08               ` Stanislaw Gruszka
2023-12-29 16:36                 ` Rafael J. Wysocki
2024-01-02  7:09                   ` Stanislaw Gruszka
2024-01-02 13:15                     ` Rafael J. Wysocki
2023-12-27 20:41         ` [PATCH v1 3/3] PM: sleep: Fix possible deadlocks in core system-wide PM code Rafael J. Wysocki
2024-01-02 13:35           ` Ulf Hansson
2024-01-02 13:53             ` Rafael J. Wysocki
2024-01-03 10:17               ` Ulf Hansson
2024-01-03 10:27                 ` Rafael J. Wysocki
2024-01-03 10:33                 ` Greg KH
2024-01-02 13:18         ` [PATCH v1 0/3] PM: sleep: Fix possible device suspend-resume deadlocks Rafael J. Wysocki
2024-01-03  4:39           ` Youngmin Nam
2024-01-03 10:28             ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).