public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Konrad Dybcio <konrad.dybcio@linaro.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	Ulf Hansson <ulf.hansson@linaro.org>, Tejun Heo <tj@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Naohiro.Aota@wdc.com, kernel-team@meta.com,
	Konrad Dybcio <konradybcio@kernel.org>,
	Bjorn Andersson <andersson@kernel.org>
Subject: Re: [PATCH v1] PM: sleep: Restore asynchronous device resume optimization
Date: Sat, 10 Feb 2024 15:04:08 +0100	[thread overview]
Message-ID: <dde25579-2f6e-4357-bce2-108061460431@linaro.org> (raw)
In-Reply-To: <CAJZ5v0g7VNdRtYsjAf-X24KtmnLzSXqgroReXAOeOQ+Myi2_Rw@mail.gmail.com>



On 2/7/24 11:59, Rafael J. Wysocki wrote:
> On Wed, Feb 7, 2024 at 11:31 AM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>>
>> Dear All,
>>
>> On 09.01.2024 17:59, Rafael J. Wysocki wrote:
>>>    From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Before commit 7839d0078e0d ("PM: sleep: Fix possible deadlocks in core
>>> system-wide PM code"), the resume of devices that were allowed to resume
>>> asynchronously was scheduled before starting the resume of the other
>>> devices, so the former did not have to wait for the latter unless
>>> functional dependencies were present.
>>>
>>> Commit 7839d0078e0d removed that optimization in order to address a
>>> correctness issue, but it can be restored with the help of a new device
>>> power management flag, so do that now.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>
>> This patch finally landed in linux-next some time ago as 3e999770ac1c
>> ("PM: sleep: Restore asynchronous device resume optimization"). Recently
>> I found that it causes a non-trivial interaction with commit
>> 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement
>> for unbound workqueues"). Since merge commit 954350a5f8db in linux-next
>> system suspend/resume fails (board doesn't wake up) on my old Samsung
>> Exynos4412-based Odroid-U3 board (ARM 32bit based), which was rock
>> stable for last years.
>>
>> My further investigations confirmed that the mentioned commits are
>> responsible for this issue. Each of them separately (3e999770ac1c and
>> 5797b1c18919) doesn't trigger any problems. Reverting any of them on top
>> of linux-next (with some additional commit due to code dependencies)
>> also fixes/hides the problem.
>>
>> Let me know if You need more information or tests on the hardware. I'm
>> open to help debugging this issue.
> 
> So I found this report:
> 
> https://lore.kernel.org/lkml/b3d08cd8-d77f-45dd-a2c3-4a4db5a98dfa@kernel.org/
> 
> which appears to be about the same issue.
> 
> Strictly speaking, the regression is introduced by 5797b1c18919,
> because it is not a mainline commit yet, but the information regarding
> the interaction of it with commit 3e999770ac1c is valuable.
> 
> Essentially, what commit 3e999770ac1c does is to schedule the
> execution of all of the async resume callbacks in a given phase
> upfront, so they can run without waiting for the sync ones to complete
> (except when there is a parent-child or supplier-consumer dependency -
> the latter represented by a device link).
> 
> What seems to be happening after commit 5797b1c18919 is that one (or
> more) of the async callbacks get blocked in the workqueue for some
> reason.
> 
> I would try to replace system_unbound_wq in
> __async_schedule_node_domain() with a dedicated workqueue that is not
> unbound and see what happens.

Thanks Rafael for helping connect the dots!

I did the laziest things imaginable to switch to a WQ that's
not unbound:

diff --git a/kernel/async.c b/kernel/async.c
index 97f224a5257b..37f1204ab4e9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -174,7 +174,7 @@ static async_cookie_t __async_schedule_node_domain(async_func_t func,
         spin_unlock_irqrestore(&async_lock, flags);
  
         /* schedule for execution */
-       queue_work_node(node, system_unbound_wq, &entry->work);
+       queue_work_node(node, system_wq, &entry->work);
  
         
..and the system can now suspend/resume all day long again!

Konrad

      reply	other threads:[~2024-02-10 14:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240207103144eucas1p16b601a73ff347d2542f8380b25921491@eucas1p1.samsung.com>
2024-01-09 16:59 ` [PATCH v1] PM: sleep: Restore asynchronous device resume optimization Rafael J. Wysocki
2024-01-10 10:37   ` Stanislaw Gruszka
2024-01-10 12:33     ` Rafael J. Wysocki
2024-01-10 14:05       ` Stanislaw Gruszka
2024-01-11  7:58         ` Stanislaw Gruszka
2024-01-11 12:01           ` Rafael J. Wysocki
2024-02-07 10:31   ` Marek Szyprowski
2024-02-07 10:38     ` Rafael J. Wysocki
2024-02-07 11:16       ` Marek Szyprowski
2024-02-07 11:25         ` Rafael J. Wysocki
2024-02-07 16:39           ` Tejun Heo
2024-02-07 18:41             ` Rafael J. Wysocki
2024-02-07 18:55             ` Marek Szyprowski
2024-02-07 18:58               ` Tejun Heo
2024-02-07 19:48                 ` Tejun Heo
2024-02-07 19:54                   ` Rafael J. Wysocki
2024-02-07 21:30                   ` Marek Szyprowski
2024-02-07 21:35                     ` Tejun Heo
2024-02-08  7:47                       ` Marek Szyprowski
2024-02-09  0:20                         ` Tejun Heo
2024-02-12 14:57                           ` Rafael J. Wysocki
2024-02-07 10:59     ` Rafael J. Wysocki
2024-02-10 14:04       ` Konrad Dybcio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dde25579-2f6e-4357-bce2-108061460431@linaro.org \
    --to=konrad.dybcio@linaro.org \
    --cc=Naohiro.Aota@wdc.com \
    --cc=andersson@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=konradybcio@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=nathan@kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=stanislaw.gruszka@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox